Resources

Chinese Mandarin Lip Reading (CMLR) Dataset

Overview

CMLR dataset was collected by the Visual Intelligence and Pattern Analysis (VIPA) group of Zhejiang University. It was designed to facilitate research on visual speech recognition, sometimes also referred to as automatic lip reading.

The dataset consists of 102,072 spoken sentences from 11 speakers, recorded between June 2009 and June 2018 from national news program “News Broadcast”. Each sentence is up to 29 Chinese characters in length and does not contain English letters, Arabic numerals and rare punctuation. The alignment boundary of each word (in seconds) is also included in the sentence. The dataset statistics are given in the table below.

Set	# sentences	# phrases	# characters
Train	71,448	22,959	3,360
Validation	10,206	10,898	2,540
Test	20,418	14,478	2,834
All	102,072	25,633	3,517

Downloads

The CMLR dataset is public to universities and research institutes for research purpose only. Before using the CMLR dataset, you are recommended to refer to the following paper:

[1] Ya Zhao, Rui Xu, and Mingli Song. A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading. ACM International Conference on Multimedia in Asia 2019

[2] Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers. The Thirty-Fourth AAAI Conference on Artificial Intelligence

Download Link: http://t.cn/A6waiog1 (Extraction code: emqx )