Center for Data Science and Analytics Talk by Daichi Mochihashi

Bayesian Unsupervised Word Segmentation and Beyond
Daichi Mochihashi
Date & Time
Monday, April 17, 2017 -
12:00 to 13:00
603, 1555 Century Avenue, Pudong New Area, Shanghai

Many languages including Japanese and Chinese are written without word boundaries, thus word segmentation is a crucial first step to natural language processing. However, because languages will inevitably contain novel words and expressions that are not covered by any dictionaries, ordinary supervised machine learning methods are incompetent to cover such phenomena. In this talk, I will present a completely unsupervised word segmentation from a Bayesian point of view, which can recognize "words" from raw strings without no human intervention. This language model can be readily applied to any languages, even if they are "alien" languages. I will also show some recent extensions to this model, specifically to recognize motion "words" in robotics using Gaussian processes.

Daichi Mochihashi is an associate professor at the Institute of Statistical Mathematics, Tokyo, Japan. He obtained BS from the University of Tokyo in 1998 and PhD from Nara Institute of Science and Technology in 2005, respectively. His research interest includes statistical natural language processing and machine learning, especially in nonparametric Bayesian statistics.

Professor Ryo Okui will introduce Prof. Daichi Mochihashi. This event is sponsored by Center for Data Science and Analytics.

