Contents
Part I Keynote Talks
Looking at the Interaction Management with New Eyes – Conversational Synchrony and Cooperation using Eye Gaze . . . 3
Kristiina Jokinen
Interacting with Purpose (and Feeling!): What Neuropsychology and the Performing Arts Can Tell Us About ’Real’ Spoken Language Behaviour . . . 5
Roger K. Moore
Part II Speech Recognition and Semantic Analysis
Accessing Web Resources in Different Languages Using a Multilingual Speech Dialog System . . . 9
Hansjorg Hofmann and Andreas Eberhardt and Ute Ehrlich
1 Introduction . . . 9
2 System Architecture . . . 10
3 Information Extraction from Semi-structured Web Sites . . . 11
4 Generic Speech Dialog . . . 12
5 System Prototype . . . 12
6 Conclusions . . . 14
References . . . 14
New Technique for Handling ASR Errors at the Semantic Level in Spoken Dialogue Systems . . . 17
Ramon Lopez-Cozar, Zoraida Callejas, David Griol and Jose F. Quesada
1 Introduction . . . 17
2 Related Work . . . 18
3 The Proposed Technique . . . 19
3.1 Creation of an initial correction model . . . 20
3.2 Optimisation of the initial correction model . . . 20
4 Experiments . . . 23
4.2 Language models for ASR . . . 25
4.3 Results . . . 25
5 Conclusions and Future Work . . . 27
References . . . 27
Combining Slot-based Vector Space Model for Voice Book Search . . . 31
Cheongjae Lee and Tatsuya Kawahara and Alexander Rudnicky
1 Introduction . . . 31
2 Related Work . . . 32
3 Data Collection . . . 32
3.1 Backend Database . . . 32
3.2 Query Collection using Amazon Mechanical Turk . . . 33
4 Book Search Algorithm . . . 33
4.1 Baseline Vector Space Model (VSM) . . . 34
4.2 Multiple VSM . . . 34
4.3 Hybrid VSM and a Back-off Scheme . . . 36
5 Search Evaluation . . . 36
5.1 Experiment Set-up . . . 36
5.2 Evaluation on Textual Queries . . . 37
5.3 Evaluation on Noisy Queries . . . 38
6 Conclusion and Discussion . . . 38
References . . . 38
Preprocessing of Dysarthric Speech in Noise Based on CV–Dependent Wiener Filtering . . . 41
Ji Hun Park, Woo Kyeong Seong, and Hong Kook Kim
1 Introduction . . . 41
2 CV–Dependent Wiener Filter . . . 42
2.1 CV–Classified VAD. . . 42
2.2 CV–Dependent Wiener Filter . . . 44
3 Performance Evaluation . . . 45
4 Conclusion . . . 46
References . . . 46
Conditional Random Fields for Modeling Korean Pronunciation Variation . . . 49
Sakriani Sakti, Andrew Finch, Chiori Hori, Hideki Kashioka, Satoshi Nakamura
1 Introduction . . . 49
2 Speech Recognition Framework . . . 50
3 Conditional Random Field Approach . . . 51
4 CRFs Feature Set . . . 51
5 Experimental Evaluation . . . 52
6 Conclusions . . . 54
7 Acknowledgements . . . 54
References . . . 54
An Analysis of the Speech Under Stress Using the Two-Mass Vocal Fold Model . . . 57
Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda
1 Introduction . . . 58
2 Measuring stress using glottal source . . . 59
2.1 Spectral flatness of the glottal flow . . . 59
2.2 Evaluation of Spectral Flatness Measure . . . 59
3 Simulation using two-mass model . . . 59
4 Conclusion . . . 62
5 Acknowledgements . . . 62
References . . . 62
Domain-Adapted Word Segmentation for an Out-of-Domain Language Modeling . . . 63
Euisok Chung, Hyung-Bae Jeon, Jeon-Gue Park and Yun-Keun Lee
1 Introduction . . . 63
2 Domain Adapted Word Segmentation . . . 64
2.1 Word Segmentation . . . 64
2.2 Domain Adaptation . . . 65
2.3 Unknown Word Extraction . . . 65
2.4 Incremental Domain Adaptation . . . 68
3 Experiments . . . 69
3.1 Word Segmentation Error Reduction . . . 69
3.2 Incremental Domain Adaptation Experiment . . . 70
4 Discussion . . . 71
5 Acknowledgements . . . 72
References . . . 72
Part III Multi-Modality for Input and Output
Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions . . . 77
Teruhisa Misu, Etsuo Mizukami, Yoshinori Shiga, Shinichi Kawamoto, Hisashi Kawai and Satoshi Nakamura
1 Introduction . . . 77
2 Related Works . . . 78
3 Construction of Spoken Dialogue TTS. . . 79
3.1 Spoken Dialogue Data collection and Model Training . . . 79
3.2 Comparison Target . . . 80
3.3 Comparison of Prosodic Features of the Synthesized
Speech . . . 80
4 Construction of Avatar Agent . . . 81
5 User Experiment . . . 82
5.1 Dialogue System used for Experiment . . . 82
5.2 Evaluation of TTS . . . 82
5.3 Effect of Avatar Agent . . . 87
6 Conclusions . . . 88
References . . . 88
Development of a Data-driven Framework for Multimodal Interactive Systems . . . 91
Masahiro Araki and Yuko Mizukami
1 Introduction . . . 91
2 Related Research . . . 92
2.1 Object-oriented approach for development of spoken dialogue systems . . . 92
2.2 Data-driven development of Web applications . . . 93
2.3 MMI system architecture . . . 93
3 Data-driven framework for MMI system development . . . 94
3.1 Background architecture . . . 94
3.2 Interaction level markup language . . . 94
3.3 Dialog flow description . . . 95
4 Object-oriented modeling language . . . 97
4.1 Language specification . . . 97
4.2 Rapid initial prototyping . . . 98
4.3 Extension . . . 99
5 Conclusion and Future Research . . . 100
References . . . 100
Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances . . . 103
Yoichi Matsuyama, Yushi Xu, Akihiro Saito, Shinya Fujie and Tetsunori Kobayashi
1 Introduction . . . 103
2 Conversation Analysis . . . 105
3 Framework . . . 105
4 Dialogue Management . . . 107
4.1 Topic Tracing . . . 107
4.2 Question Answering . . . 108
4.3 Dialogue Actions . . . 108
4.4 Combination of Utterances . . . 109
5 Conclusions . . . 111
References . . . 111
Conversational Speech Synthesis System with Communication Situation Dependent HMMs. . . 113
Kazuhiko Iwata and Tetsunori Kobayashi
1 Introduction . . . 113
2 Speech Corpora Design . . . 114
2.1 Communication Situations . . . 115
2.2 Data Acquisition . . . 116
3 System Overview . . . 117
3.1 HMM Training on Situation Dependent Speech Corpora . 117
3.2 System Configuration . . . 118
4 Evaluation . . . 119
4.1 Experimental Setup . . . 119
4.2 Experimental Result . . . 120
5 Conclusion . . . 121
References . . . 122
An Event-Based Conversational System for the Nao Robot . . . 125
Ivana Kruijff-Korbayova, Georgios Athanasopoulos, Aryel Beck, Piero Cosi, Heriberto Cuay´ahuitl, Tomas Dekens, Valentin Enescu, Antoine Hiolle, Bernd Kiefer, Hichem Sahli, Marc Schr¨oder, Giacomo Sommavilla, Fabio Tesser and Werner Verhelst
1 Introduction . . . 125
2 Event-Based Component Integration . . . 126
3 The Integrated System . . . 126
3.1 Dialogue Manager (DM) . . . 127
3.2 Audio Front End (AFE) and Voice Activity Detection (VAD) . . . 127
3.3 Automatic Speech Recognition (ASR) . . . 127
3.4 Natural Language Understanding (NLU) . . . 128
3.5 Natural Language Generation (NLG) . . . 128
3.6 Text-To-Speech Synthesis (TTS) . . . 129
3.7 Gesture Recognition and Understanding (GRU) . . . 129
3.8 Non-Verbal Behavior Planning (NVBP) & Motor Control (MC) . . . 130
4 Experience from Experiments and Conclusions . . . 130
References . . . 131
Towards Learning Human-Robot Dialogue Policies Combining Speech and Visual Beliefs . . . 133
Heriberto Cuayahuitl, Ivana Kruijff-Korbayova
1 Introduction . . . 133
2 Learning Human-Robot Dialogues Under Uncertainty . . . 134
3 Using Bayesian-Relational State Representations for Optimizing Human-Robot Dialogues . . . 134
4 Experiments and Results . . . 135
4.1 The Simulated Conversational Environment . . . 135
4.2 Characterization of the Learning Agent . . . 136
4.3 Experimental Results . . . 138
5 Conclusion and Future Work . . . 139
References . . . 139
Part IV User Modelling
JAM: Java-based Associative Memory . . . 143
Robert Propper, Felix Putze, Tanja Schultz
1 Introduction . . . 143
2 Related Work . . . 144
3 Architecture . . . 146
3.1 Knowledge Structure . . . 146
3.2 Memory Dynamics . . . 147
4 Implementation . . . 148
5 Evaluation . . . 150
5.1 Survey . . . 150
5.2 Conversation . . . 152
References . . . 154
Conversation Peculiarities of People with Different Verbal Intelligence . . . 157
Kseniya Zablotskay, Umair Rahim, Sergey Zablotskiy, Steffen Walter, Wolfgang Minker
1 Introduction . . . 158
2 Method . . . 158
2.1 Corpus Collection . . . 158
2.2 Feature Extraction . . . 159
3 Experiments and Results . . . 160
4 Discussions and Future Work . . . 161
References . . . 162
Merging Intention and Emotion to Develop Adaptive Dialogue Systems . . 165
Zoraida Callejas, David Griol, Ramon Lopez-Cozar, Gonzalo Espejo,
1 Introduction and related work . . . 165
2 Our proposal . . . 166
2.1 The emotion recognizer . . . 166
2.2 The intention recognizer . . . 167
3 The enhanced UAH dialogue system . . . 168
4 Experiments . . . 170
5 Conclusions and future work . . . 173
6 Acknowledgments . . . 174
References . . . 174
All Users Are (Not) Equal – The Influence of User Characteristics on Perceived Quality, Modality Choice and Performance . . . 175
Ina Wechsung, Matthias Schulz, Klaus-Peter Engelbrecht, Julia Niemann and Sebastian Moller
1 Introduction . . . 175
2 Related Work . . . 176
3 Method . . . 177
3.1 Experimental Set-Up . . . 178
3.2 Measures . . . 178
3.3 Procedure . . . 179
4 Results . . . 180
4.1 Factors Influencing Performance . . . 180
4.2 Factors Influencing Modality Choice . . . 181
4.3 Factors Influencing Quality Perceptions . . . 182
5 Discussion and Conclusion . . . 184
References . . . 184
Part V Dialogue Management
Parallel Computing and Practical Constraints when applying the Standard POMDP Belief Update Formalism to Spoken Dialogue Management . . . 189
Paul A. Crook, Brieuc Roblin, Hans-Wolfgang Loidl and Oliver Lemon
1 Introduction . . . 189
1.1 Paper Structure . . . 190
2 Background . . . 191
2.1 Typical SDS . . . 191
2.2 POMDP Formalism . . . 191
3 Related Literature . . . 192
3.1 Dialogue Response Time . . . 192
4 POMDP DM . . . 193
4.1 Fixed Users’ Goals during Dialogues . . . 195
5 Methodology . . . 195
6 Dense POMDP Belief Updates . . . 196
7 Limits for SDS DM . . . 197
7.1 Practical Systems. . . 198
8 Conclusions . . . 198
References . . . 200
Ranking Dialog Acts using Discourse Coherence Indicator for Language Tutoring Dialog Systems . . . 203
Hyungjong Noh, Sungjin Lee, Kyungduk Kim, Kyusong Lee, Gary Geunbae
Lee
1 Introduction . . . 204
2 Related Work . . . 205
3 Discourse Coherence Indicator for Dialog Acts . . . 206
3.1 Necessity of Discourse Coherence Indicator . . . 206
3.2 Formulation . . . 207
4 Similarity between discourse histories . . . 209
4.1 Enhanced Levenshtein Distance . . . 209
4.2 Using DCI . . . 209
4.3 Using Discount Rate Parameter . . . 210
4.4 Ranking Score Normalization . . . 210
5 Experimental Results . . . 210
5.1 Discounted Cumulative Gain . . . 211
5.2 Task Completion Rate . . . 212
5.3 Diversity of System Dialog Acts . . . 213
6 Conclusion and Future Work . . . 213
References . . . 214
On-line detection of task incompletion for spoken dialog systems using utterance and behavior tag N-gram vectors . . . 215
Sunao Hara, Norihide Kitaoka, and Kazuya Takeda
1 Introduction . . . 215
2 Spoken dialog corpus of a music retrieval task . . . 217
3 Feature construction from dialogs . . . 218
3.1 Encoding utterances and behaviors as tags . . . 218
3.2 Construction of tag N-gram feature . . . 218
3.3 Construction of interaction parameter features . . . 219
3.4 Training classifiers based on SVM . . . 220
4 Detection of task-incomplete dialogs . . . 220
4.1 Evaluation of off-line detection . . . 220
4.2 Evaluation of on-line detection performance . . . 222
5 Conclusion . . . 223
References . . . 224
Integration of Statistical Dialog Management Techniques to Implement Commercial Dialog Systems. . . 227
David Griol, Zoraida Callejas, Ramon Lopez-Cozar
1 Introduction . . . 227
2 Our Proposal to Introduce Statistical Methodologies in Commercial Dialog Systems . . . 229
2.1 Implementation by means of the Standard VoiceXML . . . 231
2.2 User Simulation to Learn the Dialog Model . . . 232
3 Development of a Railway Information System using the Proposed Technique . . . 233
4 Evaluation of the Developed Dialog System . . . 235
5 Conclusions and Future Work . . . 237
References . . . 237
A Theoretical Framework for a User-Centered Spoken Dialog Manager . . 241
Stefan Ultes, Tobias Heinroth, Alexander Schmitt, Wolfgang Minker
1 Introduction . . . 241
2 Related Work . . . 242
3 Motivation . . . 243
4 Theoretical Framework . . . 244
5 Conclusion . . . 245
6 Acknowledgement . . . 245
References . . . 246
Using probabilistic logic for dialogue strategy selection . . . 247
Ian O’Neill, Philip Hanna, Anbu Yue and Weiru Liu
1 Adaptive Dialogue . . . 247
2 The Experiment and its Evaluation . . . 249
3 Conclusions . . . 252
References . . . 252
Starting to Cook a Coaching Dialogue System in the Olympus framework 255
Joana Paulo Pardal and Nuno J. Mamede
1 Introduction . . . 255
2 COOKCOACH . . . 256
2.1 OLYMPUS/RAVENCLAW framework . . . 257
2.2 Interaction design . . . 259
2.3 Recipes Model and OntoChef . . . 260
2.4 Acquiring recipes . . . 261
2.5 Interface Design . . . 263
2.6 Cook Tutor . . . 263
3 Pursuing ontology-based dialogue systems . . . 263
3.1 Reasoning . . . 264
3.2 Learning . . . 264
3.3 New Systems . . . 265
4 Conclusion . . . 265
References . . . 266
Part VI Evaluation Strategies and Paradigms
Performance of an Ad-hoc User Simulation in a Formative Evaluation of a Spoken Dialog System . . . 271
Klaus-Peter Engelbrecht, Stefan Schmidt, Sebastian M¨oller
1 Introduction . . . 271
2 Experimental Data . . . 273
3 User and Speech Understanding Models . . . 274
4 Creating a List of Usability Problems from Real User Data . . . 275
5 Problem Discovery in the Simulated Corpora . . . 277
6 Preparation of Data for Log File Inspection . . . 279
7 Discussion . . . 280
8 Conclusions and Future Work . . . 282
References . . . 282
Adapting Dialogue to User Emotion – A Wizard-of-Oz study for adaptation strategies . . . 285
Gregor Bertrand, Florian Nothdurft, Wolfgang Minker, Harald Traue and Steffen Walter
1 Introduction . . . 286
2 Related Work . . . 286
3 Our Experiment . . . 287
3.1 Goals of the Experiment . . . 287
3.2 Prior Questionnaires . . . 288
3.3 Description of the Experiment . . . 289
3.4 Significance of Results for Dialogue Modeling . . . 291
4 Conclusion . . . 293
5 Acknowledgements . . . 293
References . . . 293
SpeechEval: A Domain-Independent User Simulation Platform for Spoken Dialog System Evaluation . . . 295
Tatjana Scheffler, Roland Roller and Norbert Reithinger
1 Introduction . . . 295
2 Related Work . . . 295
3 End-To-End User Simulation . . . 296
4 Real-Life Systems, Quick Prototyping . . . 298
5 Conclusion . . . 299
References . . . 300
Evaluating User-System Interactional Chains for Naturalness-oriented Spoken Dialogue Systems . . . 301
Etsuo Mizukami and Hideki Kashioka
1 Introduction . . . 301
2 Methods: Annotation Schemes . . . 302
2.1 Dialogue Action Coding . . . 302
2.2 Response Evaluation Coding . . . 303
3 Use cases: sightseeing guidance system . . . 305
3.1 About the system . . . 305
3.2 Dialogue data . . . 306
4 Results and Analysis . . . 306
4.1 Reliability of Coding scheme . . . 307
4.2 Evaluating appropriateness . . . 307
4.3 Evaluating the Interactional Sequence . . . 310
5 Discussion . . . 310
6 Conclusions . . . 311
References . . . 311
Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances . . . 315
Kazunori Komatani, Kyoko Matsuyama, Ryu Takeda, Tetsuya Ogata, and Hiroshi G. Okuno
1 Introduction . . . 315
2 Enumeration Subdialogue using Utterance Timing . . . 317
2.1 Interpretation using Utterance Timing . . . 317
2.2 Switching into Enumeration Subdialogue . . . 318
3 System for Experiment . . . 319
4 Dialogue Experiment . . . 321
4.1 Experimental Set up and Condition . . . 321
4.2 Experimental Results . . . 323
5 Conclusion . . . 324
References . . . 325
How context determines perceived quality and modality choice. Secondary task paradigm applied to the evaluation of multimodal interfaces. . . 327
Ina Wechsung, Robert Schleicher and Sebastian Moller
1 Introduction . . . 327
2 Related Work . . . 328
3 Method . . . 331
4 Results . . . 334
5 Discussion and Conclusion . . . 337
References . . . 338
Part VII Prototypes and Products
Design and Implementation of a Toolkit for Evaluation of Spoken Dialogue Systems Designed for AmI Environments . . . 343
Nieves Abalos, Gonzalo Espejo, Ramon Lopez-Cozar, Zoraida Callejas and David Griol
1 Introduction . . . 343
2 The toolkit . . . 344
2.1 Automatic orthographic transcriber (AOT) . . . 345
3 Mayordomo system . . . 348
3.1 The Mayordomo corpus . . . 349
3.2 Interaction with the user simulator . . . 350
4 Experiments . . . 351
5 Conclusions and future work . . . 353
References . . . 354
A Dialogue System for Conversational NPCs. . . 357
Tina Kluwer, Peter Adolphs, Feiyu Xu and Hans Uszkoreit
1 Introduction . . . 357
2 NPCs in the Virtual World . . . 358
3 The Dialogue System Architecture . . . 359
4 Input Analysis and Interpretation . . . 360
5 Dialogue Flow. . . 361
6 Conclusion . . . 362
References . . . 362
Embedded Conversational Engine for Natural Language Interaction in Spanish . . . 365
Marcos Santos-Perez, Eva Gonzalez-Parada and Jose Manuel Cano-Garcia
1 Introduction . . . 365
2 Conversational Agents Overview . . . 366
3 Conversational Engine . . . 367
3.1 Lemmatizer . . . 367
3.2 Object-oriented DataBase . . . 370
4 Test Environment . . . 371
5 Tests and Results . . . 371
6 Conclusion . . . 372
7 Acknowledgments . . . 372
References . . . 373
Adding Speech to a Robotics Simulator . . . 375
Graham Wilcock and Kristiina Jokinen
1 Introduction . . . 375
2 Pyro Robotics . . . 377
3 Spoken Interaction . . . 378
4 Spoken Dialogues . . . 379
5 Further Work . . . 380
References . . . 380
Index . . . 381