Crosslinguistic Corpus of Hesitation Phenomena, pilot project (CCHPp)

The crosslinguistic corpus of hesitation phenomena pilot project was a small-scale corpus collected and analyzed in order to test the validity and potential usefulness of the design for a larger-scale corpus project. The defining feature of the corpus is its crosslinguistic design. Many studies of second language speech have analyzed the second language speech of language learners, some of them even across several competency levels. However, very few included first language speech in order to establish a base-line speech pattern for each participant. Thus, many effects that have been observed could simply be due to individual speech patterns or even group speech patterns where the investigated group came from the same or similar language backgrounds.

The CCHPp tries to resolve this limitation of previous studies by organizing a corpus of speech from speakers in both their first language and second language. The corpus includes the recorded speech of 10 native speakers of Japanese doing parallel speaking tasks in their first language and in English, their second language. They were asked to read a text aloud, describe picture sequences which depicted some sort of simple story, and to talk freely about a particular topic.

Notable results include further support for the correlation between second language competence and speech rate as well as silent pause rate and duration. However, when accounting for first language speaking patterns, the correlation for speech rate and silent pause rate disappears. Only silent pause duration reliably corresponds with second language competence. Furthermore, as a novel result, filled pause duration (but not rate) also corresponds with second language competence.

Because this corpus was organized as a pilot study, participants were not asked to consent to the public release of their recordings or data. Thus, there is no plan to release these in the near future. However interested people may read summary statistics and analyses in the various presentations and papers based on this corpus.