What is a Filled Pause?

[This post was written for a blog I started in 2007 but discontinued soon after. Since that blog no longer exists and the content is relevant here, I've uploaded it here with its original time stamp.]

One of my main research interests is filled pauses in speech (and more recently in writing). I intend to blog a lot about it here, although not exclusively. Nonetheless, in order to get the ball rolling in the context of this blog, I'd like to start by trying to give a definition of a filled pause. This is one of those problems that seems easy at first. For instance, once I say that filled pauses are things like um and uh in speech, then pretty much everybody knows just what I'm talking about (with strong intuitions from their own personal experience). However, when we approach this problem formally, it turns out not to be so easy. I have a hard time coming up with a nice objective definition of a filled pause, because I almost always end up introducing some subjectivity into it. For instance, one possibility might go as follows.

Definition 1: A filled pause is a conventional—though non-word—expression used to stall for time during the processing of spontaneous speech.

This definition doesn't satisfy me because once we start to say how a filled pause is used, then we're getting at the intentions of the speaker, and it is very difficult to say definitively what a speaker's intentions are. Somebody might utter um reflexively as a means of stalling for time, but perhaps somebody might utter um intentionally because they want me to think that they are stalling for time (or to make some other inference about their intentions). So, ideally, I'd like to have a formal definition that doesn't depend on the subjectivity of speakers' intentions. The definition of filled pauses at Wikipedia (there called fillers) manages to leave out the subjectivity.

Definition 2: [F]illers are sounds or words that are spoken to fill up gaps in utterances.

This definition, although it sounds nice, has a logical problem: It presupposes that the utterances contain gaps in the first place. But how can know that if it actually contains a filled pause? It's important to remember that speech production is not like, say, road maintenance. Over time, a new road may develop potholes which may then later be filled. That would be a filled gap. However, a gap in an utterance—once it occurs—may not subsequently be filled. It forever remains a gap. Similarly, a filled pause in an utterance forever remains a filled pause. [Note: I am aware that the same logic can be applied to show the problematic nature of the term filled pause—it presupposes there was a pause there in the first place. However, the terminology has been established for some time now and will be difficult to change.] Another idea might be to focus merely on the phonological aspect of filled pauses. Consider this:

Definition 3: A filled pause is expressed in the following sequence of phonemes: /ə/, /əm/.

Any bilingual speaker will of course realize that one problem with this definition is that it applies only to English. Other languages have different phoneme sequences that make up filled pauses. In Spanish, it is /ɛstɛ/; in French, it is /œm/; and in Japanese it is /ɛ:to/ or /ɑno:/. But this could be solved by simply stipulating in the definition that the sequences are language-specific and then just listing what they are for each language. There are still, however, two problems that remain. First, not all sequences of these phonemes are filled pauses. In English, for instance, the word some /səm/ contains the target sequence /əm/, but is almost certainly not to be counted as a filled pause. We would not want a speech recognition engine to make this mistake. So we could just say that filled pauses are (part of) the input stream leftover after all the words have been detected. But this then brings us to the second problem: Some speech segments may contain ambiguity because of homophony between filled pauses and real words. For example, the following pair of sentences may be expressed with the exact same sequence of phonetic symbols.

  1. I ate a rabbit. /aI eIɾ ə ræbIt/
  2. I ate uh rabbit. /aI eIɾ ə ræbIt/

While there's not a whole lot of difference between the two sentences (for instance, 1 entails 2), nevertheless they do have different propositional content, and in certain contexts, that difference could be crucial to communication. Yet another possibility is to focus on the semantic value of a filled pause.

Definition 4: A filled pause is an element of speech that makes no contribution to the semantic proposition of the utterance which contains it.

This isn't too bad, but is way too broad. There are many other things we would not to regard as filled pauses that could pass this definition. Interjections and Particles are a good example. There is no difference in the propositional content of the following three utterances.

  1. He's strong!
  2. Gosh, he's strong!
  3. He's strong, man!

Thus, while the empty semantic value definition is surely part of the answer, the definition still needs other things to give it the proper specificity and diagnostic power. In short, it seems that some elements of all of the above definitions is necessary. Here is one more proposal.

Definition 5: A filled pause is a semantically empty element of speech which fits a language-specific conventional phonetic form and delays (either intentionally or not) the transfer of the speaker's message.

I'm still not sure I'm satisfied with this, but it seems to capture the good points of the previous three definitions without introducing much other baggage. Feel free to comment on this definition. In addition, though, keep looking for further posts which may refine this concept further. Like most academic questions, this topic not yet closed.