HESITA(tions) in Portuguese: a database

Publication Type:

Conference Paper


The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013), Volume 54, Number 1, Stockholm, Sweden, p.13-16 (2013)






annotation, disfluency, DiSS, hesitation corpus, hesitations, prepared speech, spontaneous speech


With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.


KTH Royal Institute of Technology; August 21-23, 2013