Chinese Mandarin Lip Reading (CMLR) Dataset

Overview

CMLR dataset was collected by the Visual Intelligence and Pattern Analysis (VIPA) group of Zhejiang University. It was designed to facilitate research on visual speech recognition, sometimes also referred to as automatic lip reading.

The dataset consists of 102,072 spoken sentences from 11 speakers, recorded between June 2009 and June 2018 from national news program “News Broadcast”. Each sentence is up to 29 Chinese characters in length and does not contain English letters, Arabic numerals and rare punctuation. The alignment boundary of each word (in seconds) is also included in the sentence. The dataset statistics are given in the table below.

Set # sentences # phrases # characters
Train 71,448 22,959 3,360
Validation 10,206 10,898 2,540
Test 20,418 14,478 2,834
All 102,072 25,633 3,517



Downloads

The CMLR dataset is public to universities and research institutes for research purpose only. Before using the CMLR dataset, you are recommended to refer to the following paper:

[1] Ya Zhao, Rui Xu, and Mingli Song. A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading. ACM International Conference on Multimedia in Asia 2019

[2] Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers. The Thirty-Fourth AAAI Conference on Artificial Intelligence


File lists: Train, Val, Test


talker video audio text
s1 download download download
s2 download download download
s3 download download download
s4 download download download
s5 download download download
s6 download download download
s7 download download download
s8 download download download
s9 download download download
s10 download download download
s11 download download download