Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Corpus of Hesitation Phenomena

The Corpus of Hesitation Phenomena is a corpus of speech which has been transcribed and annotated for various speech features, most notably types of hesitation phenomena (e.g., silent pauses, filled pauses, repairs). The corpus is actually a family of three corpora, as follows.

Corpus of Hesitation Phenomena (CHP)

A corpus of recorded speech by four native speakers of English responding freely to various questions in an interview format. Annotations include hesitation phenomena, word and syllable boundaries, tone unit boundaries with tone choice marking, and turn durations.

Crosslinguistic Corpus of Hesitation Phenomena, pilot version (CCHPp)

A corpus of recorded speech by 10 native speakers of Japanese, responding to parallel speaking tasks in both their first language and English (their second language). Annotations include hesitation phenomena, word and pause interval durations, formant measurements on filled pauses, as well as fluency and accent ratings by native speakers of English.

Crosslinguistic Corpus of Hesitation Phenomena (CCHP)

A corpus of recorded speech by 30 native speakers of Japanese, responding to parallel speaking task in both their first language and English (their second language). Annotations (will) include hesitation phenomena, syllable and pause interval durations, formant measurements on filled pauses and vowels, clause boundary markings, and fluency ratings by native speakers of English.

Those who are interested in the most current corpus and who would like to download the corpus should access the CCHP page.