Distributed computing framework
We consider a query optimization problem for iterative queries in distributed environment. Our techinique, OptIQ, removes redundant computations among different iterations by extending the traditional techniques of view materialization and incremental view evaluation. First, OptIQ decomposes iterative queries into invariant and variant views, and materializes the former view. Redundant computations are removed by reusing the materialized view among iterations. Second, OptIQ incrementally evaluates the variant view, so that redundant computations are removed by skipping the evaluation on converged tuples in the variant view. We verify the effectiveness of OptIQ both on MapReduce and Spark environments through the queries of PageRank and k-means clustering on real datasets.
Graph mining framework
Recent advances in information science have shown that linked data pervade our society and the natural world around us. Graphs have become increasingly important for representing complicated structures and schema-less data such as those generated by Wikipedia, Freebase, and various social networks. However, existing algorithms cannot handle large graphs efficiently, so fast algorithms are needed. We introduce two fast algorithms for identifying the top-k nodes of personalized PageRank and graph clustering. They outperform previous algorithms in terms of both speed and quality. Personalized PageRank and graph clustering are fundamental to many applications. Our algorithms allow many applications to be processed more efficiently and will help to improve the effectiveness of future applications. We are also considering a graph mining framework by which programmers easily integrate variety of graph mining algorithms to obtain hidden knowledge from large-scale graph data.
We are working on developing conversation systems that understand what people say, how they feel, and react naturally like humans. Especially systems capable of chatting everyday conversation with people (called chatbots) are our target. Both search-based and generation-oriented approaches are our interests.
Search-based: Search and find appropriate utterances from a large-scale (previous) conversation data.
Generation-oriented: Generate utterances from scratch using various machine-learning approaches.
We are collaborating with Rinna Team at Microsoft Japan, under the program of MSRA CORE project.
Language barrier is the challenge for people: it causes miscommunication and misunderstanding. We tackle this challenge by research on machine translation. Specifically, machine translation based on neural network is our primary target. We are interested in machine translation on from major languages (like English and Japanese) to minor languages.
Language Learning Support
Everyone knows that learning languages is important but tough. We are developing systems that support language learners as well as language teachers. We have developed a system that automatically judges your English writing level and a lexical simplification method to help teachers adjust levels of their education materials.