Phrase Alignment project aims to extract phrasal paraphrases with syntactic structures through the development of phrase alignment methods between sentential paraphrase pair as well as their applications. Paraphrases are a valuable resource for various applications, such as a conversation system, question answering, and data augmentation for NLP. Little attention has been paid to the syntactic structures in paraphrases, especially phrasal paraphrases, by conventional studies on paraphrase collection. Lack of structural information disturbs generalization of paraphrasal knowledge and hinders applications of paraphrases.


Yuki Arase, Graduate School of Information Science and Technology, Osaka University
Junichi Tsujii, Artificial Intelligence Research Center, AIST


  1. Y. Arase and J. Tsujii: Transfer Fine-Tuning: A BERT Case Study, in Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP2019), pp. 5396–5407 (Nov. 2019).
  2. Y. Arase and J. Tsujii: SPADE: Evaluation Dataset for Monolingual Phrase Alignment, in Proc. of Language Resources and Evaluation Conference (LREC 2018), (May 2018).
  3. Y. Arase and J. Tsujii: Monolingual Phrase Alignment on Parse Forests, in Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), pp. 1-11 (Sept. 2017).

Datasets and Models

We have created a dataset for evaluation of phrase alignment that
  • provides ground-truth tree structures (PTB and HPSG)
  • provides ground-truth phrase alignments
in 201 paraphrasal sentence pairs.
The dataset is available at LDC (LDC2018T09).

We have shown that fine-tuning BERT models using phrasal paraphrases makes their sentence representations more preferable to sentence pair modeling, i.e., paraphrase identification, semantic textual similarity assessment, and NLI. The fine-tuned BERT models with phrasal paraphrases are available at my GitHub page.


This project has been supported by
  • ACT-I, JST (Apr. 2017-Mar. 2020)
  • The Japan Prize Foundation (Apr. 2017-Mar. 2018)
  • Kayamori Foundation of Informational Science Advancement (Nov. 2016-Dec. 2018)
  • Microsoft Research CORE program (Apr. 2015-Mar. 2016)


If you have any inquiries, please contact: E-mail arase at

Back to Top