CMLR dataset was collected by the Visual Intelligence and Pattern Analysis (VIPA)
group of Zhejiang
University. It was designed to facilitate research on visual speech recognition, sometimes also referred
to as automatic lip reading.
The dataset consists of 102,072 spoken sentences from 11 speakers, recorded between June 2009 and June 2018
national news program “News Broadcast”. Each sentence is up to 29 Chinese characters in length and does not
contain English letters, Arabic numerals and rare punctuation. The alignment boundary of each word (in
seconds) is also included in the sentence. The dataset statistics are given in the table below.
The CMLR dataset is public to universities and research institutes for research
purpose only. Before using the CMLR dataset, you are recommended to refer to the following paper:
 Ya Zhao, Rui Xu, and Mingli Song. A Cascade Sequence-to-Sequence Model for
Mandarin Lip Reading. ACM International Conference on Multimedia in Asia 2019
 Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song. Hearing
Lips: Improving Lip Reading by Distilling Speech Recognizers. The Thirty-Fourth AAAI Conference on Artificial
Notice: Chrome users may encounter the "mixed content downloads are blocked" problem.
Please try another web browser, or right click and copy the target link address
and paste it into the address bar of a new tab.