paraphrase & phrase alignment Phrase Alignment project aims to extract phrasal paraphrases with syntactic structures through development of phrase alignment method between sentential paraphrase pair. Paraphrases are useful resource for various applications, such as a conversation system, language generation, web search, and machine translation. Although conventional studies focus on collecting phrasal paraphrases, most of them have not cared their syntactic structures. Lack of structural information disturbs generalization of paraphrasal knowledge. Therefore, applications suffer from shortage of that knowledge even though having billions of paraphrases. Furthermore, they also struggle to decide if a phrase can substitute its paraphrase. Unacceptable replacement deteriorates the adequacy, fluency, and grammaticality of the original sentence.

This project sheds light on the importance of the syntactic structure in paraphrases, and develops an efficient method to extract paraphrases with structures.


Yuki Arase, Graduate School of Information Science and Technology, Osaka University
Junichi Tsujii, Artificial Intelligence Research Center, AIST


Yuki Arase and Jun'ichi Tsujii: Monolingual Phrase Alignment on Parse Forests, Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP2017 to appear).


We have created a dataset that
  • provides ground-truth tree structures (PTB and HPSG)
  • provides ground-truth phrase alignments
in paraphrasal sentence pairs.

The dataset will be published for research purposes.


