next up previous
Next: Identifying Quoted Speech Types Up: ESPER: architecture Previous: ESPER: architecture

Identifying Spoken Speech in a Children's Story

We define quoted speech to be any quote-annotated text segments within the body of a story. ESPER is able to detect quoted speech in a given story and label the extracted speech using CSML. The following is an example of the quoted-speech CSML markup:

<QUOTE TYPE="NEW"> `Come, there's no use 
in crying like that!'</QUOTE> said 
Alice to herself, rather sharply;

It is worth noting that even though the content of these stories are child-oriented, the structure of the stories can nevertheless be quite complex. Hence ESPER also takes into consideration such structures as nested quoted-speech; this can occur when a character in a story is narrating a story of his or her own, with its own set of characters and quoted speech, essentially creating a story within a story. For example:

The farmer described it to his wife: 
<QUOTE TYPE="NEW"> "The tail-feathers
of the fowl were very short, and it 
winked with both its eyes, and said 
<QUOTE TYPE="NEW"> "Cluck, cluck."</QUOTE> 
What were the thoughts of the fowl as it 
said this I cannot tell you..." </QUOTE>
We tested the quoted-speech identification module in ESPER on a single story where all quoted speech had been hand-annotated. The story was selected from a collection of stories not included in ESPER's development corpus.

Table 1: Quoted-Speech Identification Evaluated on Little Women, Chap.1, By L.M. Alcott
Recall Precision
100% 94%


The results show that ESPER was able to correctly identify all the hand-annotated spoken speech in the story. However, the comparatively lower precision is attributable to the fact that ESPER does not discriminate between actual spoken speech and quoted labels. Consider the following: In this case, the quoted text seems to represent a label. Although it may be reasonable to synthesize such labels using a special voice to make the story more interesting, we must nevertheless distinguish them from actual pieces of speech so as not to mistakenly assign any random character's voice, which would produce confusion for the story listener. In the future, we would like to automatically differentiate such labels from quoted-speech and synthesize them using a pre-defined voice (such as that of the narrator or the main character).


next up previous
Next: Identifying Quoted Speech Types Up: ESPER: architecture Previous: ESPER: architecture
Alan W Black 2003-10-20