研究成果
Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases
Hui Huang, Xuanxin Wu, Muyun Yang, Yuki Arase
ACL 2026 Workshop on Evaluating Evaluations (EvalEval)
2026年7月
An In-depth Evaluation of Large Language Models in Sentence Simplification with Error-based Human Assessment
ACM Transactions on Intelligent Systems and Technology (ACM TIST)
2026年4月
巻・号・ページ: Vol. 17 , No. 4
現在、受賞データはありません。