More CCHP files added to archive

The files for five more participants have been added to the Crosslinguistic Corpus of Hesitation Phenomena (CCHP) archive.  The new files include wav, mp3, annotated xml and plain text transcripts of the participants' speech.  Participant collections (single files containing all files for each participant) have also been uploaded.  However, other collections (e.g., by language, by task, etc.) have not been updated.  Those who wish to have the complete current corpus should download the participant collection files one by one.

Filler words and filled pauses: Are they literally the same?

Social media spent some bandwidth last week flogging away at Vice-President Joe Biden's prolific use of 'literally' in his address to the Democratic Convention. Frankly, I don't really have much problem with this. The alternate use of 'literally' as an intensifier as opposed to a literal antonym of 'figuratively' is not some recent neologism. As Ben Zimmer points out at Language Log, this usage has been around since the 18th century. I won't go into the details of all that since the big guns at LL have already done the work.

And anyway, this is the Filled Pause Research Center.  So what's the relevance here?  Well, James Taranto, the Wall Street Journal columnist, took on the task of analyzing the ten different instances of 'literally' in Biden's speech.  Here's part of his contribution to the Biden brouhaha.

What makes this exercise even funnier is the fact that the word "literally" does not appear once--literally!--in the prepared text. All 10 "literallys" were extemporaneous. When Biden says "literally," it seems, he means "uh."

Crosslinguistic Corpus of Hesitation Phenomena (CCHP) First Release!

The Filled Pause Research Center is please to announce the initial release of Corpus of Hesitation Phenomena (CCHP) materials. This release includes audio files (wav and mp3) and transcripts (annotated xml and plain text) for six participants.  The transcription process is still ongoing. Thus, transcripts in this release do not yet contain time markings and there are no Praat TextGrid files yet.

Those who wish to access the corpus are asked to create a new account in the FPRC.  After doing so, the corpus archive can be accessed on the CCHP main page. Registered users may then download the entire corpus (as released so far) or sub-collections of the corpus or browse and download individual files in the corpus.

Deception and the use of filled pauses

Meet the Parents (2000, Universal Pictures) - Lie detector sceneI was browsing through Lifehacker the other day and found a recent posting entitled, "Spot Liars by Paying Attention to Their Reaction Within the First Five Seconds of a Conversation" by Thorin Klosowski. This seemed interesting so I started reading and came across the following.

We've talked a lot about the different types of clues you can watch out for when trying to detect a lie, including, the various, scientifically proven methods, the fact many liars begin a sentence with "well", how liars often use filler words like "um" or "ah", and how body language might reveal a liar.

First Stage of Transcription of CCHP Recordings has begun

I met with the research support staff recently to go over the procedures for transcribing all the recordings. There is roughly 9 hours of recordings to be transcribed in several stages. The first stage, which is probably the most arduous, is to transcribe all the words in each recording delimiting them minimally into utterances. In addition, the staff will transcribe all overt hesitation phenomena including filled pauses, false starts, repair sequences, and repeats (see Taxonomy for some details).

Recording of 30 participants has been completed

Today we completed recording the last of the 30 scheduled participants for the Crosslinguistic Corpus of Hesitation Phenomena. For the most part, things went smoothly. There were only two minor cases where a participant's recording had to be re-done (i.e., one three-minute speaking task in both cases). The only real difficulty we had was that a number of students (about 1 in 5) who had volunteered to participate were no-shows. I had to do a second listing on the university job list to fill out the quota. In the end, though, we got our 30.

Poster presentation at IWoLP

I recently gave a poster presentation at the International Workshop on Language Production (IWoLP 2012) held at New York University, July 18-20. The conference was quite interesting. It definitely was oriented more toward (neuro)psychology than linguistics (i.e., lots of fMRI studies, few grammatical judgment surveys), yet was still a large enough program that there was something for everyone.

Recording participants for CCHP has started

The recording process for the Crosslinguistic Corpus of Hesitation Phenomena starts today. The plan is to record 30 native speakers of Japanese speaking in both their native language and English (their second language). I posted a recruitment ad on the university part-time work list and within a day had 30 students lined up. That work list is a fantastic way to recruit experimental participants.

CHP research support staff have been hired

The budget for the first year of the CCHP allows for three research support staff. They will be responsible for running experiments (i.e., recording participants) and doing transcription and annotation. After a brief search, I've been able to find three graduate students at the university to hire. They will assist in the project during 2012-2013.

[This post was actually written 10 August 2012, but has been back-dated to correspond to the actual events described herein.]

Recording studio equipment has arrived

The first order of business in getting prepared for the CCHP recording process was to acquire the necessary recording equipment. My department managed to get a little extra space last academic year and we got the facilities people to spend some of their leftover budget at the end of the fiscal year to outfit a research laboratory for us with a soundproof room in it. Well, it's not soundproof, but it will certainly cut out a lot of the background noise that caused problems in the pilot study.