Proceedings of the Paralinguistic Information and Its Integration in Spoken Dialogue Systems Workshop


Proceedings of the Paralinguistic Information and Its Integration in Spoken Dialogue Systems Workshop (2011)


Contents

Part I Keynote Talks

Looking at the Interaction Management with New Eyes – Conversational Synchrony and Cooperation using Eye Gaze . . . 3

Kristiina Jokinen

Interacting with Purpose (and Feeling!): What Neuropsychology and the Performing Arts Can Tell Us About ’Real’ Spoken Language Behaviour . . . 5

Roger K. Moore

Part II Speech Recognition and Semantic Analysis

Accessing Web Resources in Different Languages Using a Multilingual Speech Dialog System . . . 9

Hansjorg Hofmann and Andreas Eberhardt and Ute Ehrlich

1 Introduction . . . 9

2 System Architecture . . . 10

3 Information Extraction from Semi-structured Web Sites . . . 11

4 Generic Speech Dialog . . . 12

5 System Prototype . . . 12

6 Conclusions . . . 14

References . . . 14

New Technique for Handling ASR Errors at the Semantic Level in Spoken Dialogue Systems . . . 17

Ramon Lopez-Cozar, Zoraida Callejas, David Griol and Jose F. Quesada

1 Introduction . . . 17

2 Related Work . . . 18

3 The Proposed Technique . . . 19

3.1 Creation of an initial correction model . . . 20

3.2 Optimisation of the initial correction model . . . 20

4 Experiments . . . 23

4.2 Language models for ASR . . . 25

4.3 Results . . . 25

5 Conclusions and Future Work . . . 27

References . . . 27

Combining Slot-based Vector Space Model for Voice Book Search . . . 31

Cheongjae Lee and Tatsuya Kawahara and Alexander Rudnicky

1 Introduction . . . 31

2 Related Work . . . 32

3 Data Collection . . . 32

3.1 Backend Database . . . 32

3.2 Query Collection using Amazon Mechanical Turk . . . 33

4 Book Search Algorithm . . . 33

4.1 Baseline Vector Space Model (VSM) . . . 34

4.2 Multiple VSM . . . 34

4.3 Hybrid VSM and a Back-off Scheme . . . 36

5 Search Evaluation . . . 36

5.1 Experiment Set-up . . . 36

5.2 Evaluation on Textual Queries . . . 37

5.3 Evaluation on Noisy Queries . . . 38

6 Conclusion and Discussion . . . 38

References . . . 38

Preprocessing of Dysarthric Speech in Noise Based on CV–Dependent Wiener Filtering . . . 41

Ji Hun Park, Woo Kyeong Seong, and Hong Kook Kim

1 Introduction . . . 41

2 CV–Dependent Wiener Filter . . . 42

2.1 CV–Classified VAD. . . 42

2.2 CV–Dependent Wiener Filter . . . 44

3 Performance Evaluation . . . 45

4 Conclusion . . . 46

References . . . 46

Conditional Random Fields for Modeling Korean Pronunciation Variation . . . 49

Sakriani Sakti, Andrew Finch, Chiori Hori, Hideki Kashioka, Satoshi Nakamura

1 Introduction . . . 49

2 Speech Recognition Framework . . . 50

3 Conditional Random Field Approach . . . 51

4 CRFs Feature Set . . . 51

5 Experimental Evaluation . . . 52

6 Conclusions . . . 54

7 Acknowledgements . . . 54

References . . . 54

An Analysis of the Speech Under Stress Using the Two-Mass Vocal Fold Model . . . 57

Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda

1 Introduction . . . 58

2 Measuring stress using glottal source . . . 59

2.1 Spectral flatness of the glottal flow . . . 59

2.2 Evaluation of Spectral Flatness Measure . . . 59

3 Simulation using two-mass model . . . 59

4 Conclusion . . . 62

5 Acknowledgements . . . 62

References . . . 62

Domain-Adapted Word Segmentation for an Out-of-Domain Language Modeling . . . 63

Euisok Chung, Hyung-Bae Jeon, Jeon-Gue Park and Yun-Keun Lee

1 Introduction . . . 63

2 Domain Adapted Word Segmentation . . . 64

2.1 Word Segmentation . . . 64

2.2 Domain Adaptation . . . 65

2.3 Unknown Word Extraction . . . 65

2.4 Incremental Domain Adaptation . . . 68

3 Experiments . . . 69

3.1 Word Segmentation Error Reduction . . . 69

3.2 Incremental Domain Adaptation Experiment . . . 70

4 Discussion . . . 71

5 Acknowledgements . . . 72

References . . . 72

Part III Multi-Modality for Input and Output

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions . . . 77

Teruhisa Misu, Etsuo Mizukami, Yoshinori Shiga, Shinichi Kawamoto, Hisashi Kawai and Satoshi Nakamura

1 Introduction . . . 77

2 Related Works . . . 78

3 Construction of Spoken Dialogue TTS. . . 79

3.1 Spoken Dialogue Data collection and Model Training . . . 79

3.2 Comparison Target . . . 80

3.3 Comparison of Prosodic Features of the Synthesized

Speech . . . 80

4 Construction of Avatar Agent . . . 81

5 User Experiment . . . 82

5.1 Dialogue System used for Experiment . . . 82

5.2 Evaluation of TTS . . . 82

5.3 Effect of Avatar Agent . . . 87

6 Conclusions . . . 88

References . . . 88

Development of a Data-driven Framework for Multimodal Interactive Systems . . . 91

Masahiro Araki and Yuko Mizukami

1 Introduction . . . 91

2 Related Research . . . 92

2.1 Object-oriented approach for development of spoken dialogue systems . . . 92

2.2 Data-driven development of Web applications . . . 93

2.3 MMI system architecture . . . 93

3 Data-driven framework for MMI system development . . . 94

3.1 Background architecture . . . 94

3.2 Interaction level markup language . . . 94

3.3 Dialog flow description . . . 95

4 Object-oriented modeling language . . . 97

4.1 Language specification . . . 97

4.2 Rapid initial prototyping . . . 98

4.3 Extension . . . 99

5 Conclusion and Future Research . . . 100

References . . . 100

Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances . . . 103

Yoichi Matsuyama, Yushi Xu, Akihiro Saito, Shinya Fujie and Tetsunori Kobayashi

1 Introduction . . . 103

2 Conversation Analysis . . . 105

3 Framework . . . 105

4 Dialogue Management . . . 107

4.1 Topic Tracing . . . 107

4.2 Question Answering . . . 108

4.3 Dialogue Actions . . . 108

4.4 Combination of Utterances . . . 109

5 Conclusions . . . 111

References . . . 111

Conversational Speech Synthesis System with Communication Situation Dependent HMMs. . . 113

Kazuhiko Iwata and Tetsunori Kobayashi

1 Introduction . . . 113

2 Speech Corpora Design . . . 114

2.1 Communication Situations . . . 115

2.2 Data Acquisition . . . 116

3 System Overview . . . 117

3.1 HMM Training on Situation Dependent Speech Corpora . 117

3.2 System Configuration . . . 118

4 Evaluation . . . 119

4.1 Experimental Setup . . . 119

4.2 Experimental Result . . . 120

5 Conclusion . . . 121

References . . . 122

An Event-Based Conversational System for the Nao Robot . . . 125

Ivana Kruijff-Korbayova, Georgios Athanasopoulos, Aryel Beck, Piero Cosi, Heriberto Cuay´ahuitl, Tomas Dekens, Valentin Enescu, Antoine Hiolle, Bernd Kiefer, Hichem Sahli, Marc Schr¨oder, Giacomo Sommavilla, Fabio Tesser and Werner Verhelst

1 Introduction . . . 125

2 Event-Based Component Integration . . . 126

3 The Integrated System . . . 126

3.1 Dialogue Manager (DM) . . . 127

3.2 Audio Front End (AFE) and Voice Activity Detection (VAD) . . . 127

3.3 Automatic Speech Recognition (ASR) . . . 127

3.4 Natural Language Understanding (NLU) . . . 128

3.5 Natural Language Generation (NLG) . . . 128

3.6 Text-To-Speech Synthesis (TTS) . . . 129

3.7 Gesture Recognition and Understanding (GRU) . . . 129

3.8 Non-Verbal Behavior Planning (NVBP) & Motor Control (MC) . . . 130

4 Experience from Experiments and Conclusions . . . 130

References . . . 131

Towards Learning Human-Robot Dialogue Policies Combining Speech and Visual Beliefs . . . 133

Heriberto Cuayahuitl, Ivana Kruijff-Korbayova

1 Introduction . . . 133

2 Learning Human-Robot Dialogues Under Uncertainty . . . 134

3 Using Bayesian-Relational State Representations for Optimizing Human-Robot Dialogues . . . 134

4 Experiments and Results . . . 135

4.1 The Simulated Conversational Environment . . . 135

4.2 Characterization of the Learning Agent . . . 136

4.3 Experimental Results . . . 138

5 Conclusion and Future Work . . . 139

References . . . 139

Part IV User Modelling

JAM: Java-based Associative Memory . . . 143

Robert Propper, Felix Putze, Tanja Schultz

1 Introduction . . . 143

2 Related Work . . . 144

3 Architecture . . . 146

3.1 Knowledge Structure . . . 146

3.2 Memory Dynamics . . . 147

4 Implementation . . . 148

5 Evaluation . . . 150

5.1 Survey . . . 150

5.2 Conversation . . . 152

References . . . 154

Conversation Peculiarities of People with Different Verbal Intelligence . . . 157

Kseniya Zablotskay, Umair Rahim, Sergey Zablotskiy, Steffen Walter, Wolfgang Minker

1 Introduction . . . 158

2 Method . . . 158

2.1 Corpus Collection . . . 158

2.2 Feature Extraction . . . 159

3 Experiments and Results . . . 160

4 Discussions and Future Work . . . 161

References . . . 162

Merging Intention and Emotion to Develop Adaptive Dialogue Systems . . 165

Zoraida Callejas, David Griol, Ramon Lopez-Cozar, Gonzalo Espejo,

1 Introduction and related work . . . 165

2 Our proposal . . . 166

2.1 The emotion recognizer . . . 166

2.2 The intention recognizer . . . 167

3 The enhanced UAH dialogue system . . . 168

4 Experiments . . . 170

5 Conclusions and future work . . . 173

6 Acknowledgments . . . 174

References . . . 174

All Users Are (Not) Equal – The Influence of User Characteristics on Perceived Quality, Modality Choice and Performance . . . 175

Ina Wechsung, Matthias Schulz, Klaus-Peter Engelbrecht, Julia Niemann and Sebastian Moller

1 Introduction . . . 175

2 Related Work . . . 176

3 Method . . . 177

3.1 Experimental Set-Up . . . 178

3.2 Measures . . . 178

3.3 Procedure . . . 179

4 Results . . . 180

4.1 Factors Influencing Performance . . . 180

4.2 Factors Influencing Modality Choice . . . 181

4.3 Factors Influencing Quality Perceptions . . . 182

5 Discussion and Conclusion . . . 184

References . . . 184

Part V Dialogue Management

Parallel Computing and Practical Constraints when applying the Standard POMDP Belief Update Formalism to Spoken Dialogue Management . . . 189

Paul A. Crook, Brieuc Roblin, Hans-Wolfgang Loidl and Oliver Lemon

1 Introduction . . . 189

1.1 Paper Structure . . . 190

2 Background . . . 191

2.1 Typical SDS . . . 191

2.2 POMDP Formalism . . . 191

3 Related Literature . . . 192

3.1 Dialogue Response Time . . . 192

4 POMDP DM . . . 193

4.1 Fixed Users’ Goals during Dialogues . . . 195

5 Methodology . . . 195

6 Dense POMDP Belief Updates . . . 196

7 Limits for SDS DM . . . 197

7.1 Practical Systems. . . 198

8 Conclusions . . . 198

References . . . 200

Ranking Dialog Acts using Discourse Coherence Indicator for Language Tutoring Dialog Systems . . . 203

Hyungjong Noh, Sungjin Lee, Kyungduk Kim, Kyusong Lee, Gary Geunbae

Lee

1 Introduction . . . 204

2 Related Work . . . 205

3 Discourse Coherence Indicator for Dialog Acts . . . 206

3.1 Necessity of Discourse Coherence Indicator . . . 206

3.2 Formulation . . . 207

4 Similarity between discourse histories . . . 209

4.1 Enhanced Levenshtein Distance . . . 209

4.2 Using DCI . . . 209

4.3 Using Discount Rate Parameter . . . 210

4.4 Ranking Score Normalization . . . 210

5 Experimental Results . . . 210

5.1 Discounted Cumulative Gain . . . 211

5.2 Task Completion Rate . . . 212

5.3 Diversity of System Dialog Acts . . . 213

6 Conclusion and Future Work . . . 213

References . . . 214

On-line detection of task incompletion for spoken dialog systems using utterance and behavior tag N-gram vectors . . . 215

Sunao Hara, Norihide Kitaoka, and Kazuya Takeda

1 Introduction . . . 215

2 Spoken dialog corpus of a music retrieval task . . . 217

3 Feature construction from dialogs . . . 218

3.1 Encoding utterances and behaviors as tags . . . 218

3.2 Construction of tag N-gram feature . . . 218

3.3 Construction of interaction parameter features . . . 219

3.4 Training classifiers based on SVM . . . 220

4 Detection of task-incomplete dialogs . . . 220

4.1 Evaluation of off-line detection . . . 220

4.2 Evaluation of on-line detection performance . . . 222

5 Conclusion . . . 223

References . . . 224

Integration of Statistical Dialog Management Techniques to Implement Commercial Dialog Systems. . . 227

David Griol, Zoraida Callejas, Ramon Lopez-Cozar

1 Introduction . . . 227

2 Our Proposal to Introduce Statistical Methodologies in Commercial Dialog Systems . . . 229

2.1 Implementation by means of the Standard VoiceXML . . . 231

2.2 User Simulation to Learn the Dialog Model . . . 232

3 Development of a Railway Information System using the Proposed Technique . . . 233

4 Evaluation of the Developed Dialog System . . . 235

5 Conclusions and Future Work . . . 237

References . . . 237

A Theoretical Framework for a User-Centered Spoken Dialog Manager . . 241

Stefan Ultes, Tobias Heinroth, Alexander Schmitt, Wolfgang Minker

1 Introduction . . . 241

2 Related Work . . . 242

3 Motivation . . . 243

4 Theoretical Framework . . . 244

5 Conclusion . . . 245

6 Acknowledgement . . . 245

References . . . 246

Using probabilistic logic for dialogue strategy selection . . . 247

Ian O’Neill, Philip Hanna, Anbu Yue and Weiru Liu

1 Adaptive Dialogue . . . 247

2 The Experiment and its Evaluation . . . 249

3 Conclusions . . . 252

References . . . 252

Starting to Cook a Coaching Dialogue System in the Olympus framework 255

Joana Paulo Pardal and Nuno J. Mamede

1 Introduction . . . 255

2 COOKCOACH . . . 256

2.1 OLYMPUS/RAVENCLAW framework . . . 257

2.2 Interaction design . . . 259

2.3 Recipes Model and OntoChef . . . 260

2.4 Acquiring recipes . . . 261

2.5 Interface Design . . . 263

2.6 Cook Tutor . . . 263

3 Pursuing ontology-based dialogue systems . . . 263

3.1 Reasoning . . . 264

3.2 Learning . . . 264

3.3 New Systems . . . 265

4 Conclusion . . . 265

References . . . 266

Part VI Evaluation Strategies and Paradigms

Performance of an Ad-hoc User Simulation in a Formative Evaluation of a Spoken Dialog System . . . 271

Klaus-Peter Engelbrecht, Stefan Schmidt, Sebastian M¨oller

1 Introduction . . . 271

2 Experimental Data . . . 273

3 User and Speech Understanding Models . . . 274

4 Creating a List of Usability Problems from Real User Data . . . 275

5 Problem Discovery in the Simulated Corpora . . . 277

6 Preparation of Data for Log File Inspection . . . 279

7 Discussion . . . 280

8 Conclusions and Future Work . . . 282

References . . . 282

Adapting Dialogue to User Emotion – A Wizard-of-Oz study for adaptation strategies . . . 285

Gregor Bertrand, Florian Nothdurft, Wolfgang Minker, Harald Traue and Steffen Walter

1 Introduction . . . 286

2 Related Work . . . 286

3 Our Experiment . . . 287

3.1 Goals of the Experiment . . . 287

3.2 Prior Questionnaires . . . 288

3.3 Description of the Experiment . . . 289

3.4 Significance of Results for Dialogue Modeling . . . 291

4 Conclusion . . . 293

5 Acknowledgements . . . 293

References . . . 293

SpeechEval: A Domain-Independent User Simulation Platform for Spoken Dialog System Evaluation . . . 295

Tatjana Scheffler, Roland Roller and Norbert Reithinger

1 Introduction . . . 295

2 Related Work . . . 295

3 End-To-End User Simulation . . . 296

4 Real-Life Systems, Quick Prototyping . . . 298

5 Conclusion . . . 299

References . . . 300

Evaluating User-System Interactional Chains for Naturalness-oriented Spoken Dialogue Systems . . . 301

Etsuo Mizukami and Hideki Kashioka

1 Introduction . . . 301

2 Methods: Annotation Schemes . . . 302

2.1 Dialogue Action Coding . . . 302

2.2 Response Evaluation Coding . . . 303

3 Use cases: sightseeing guidance system . . . 305

3.1 About the system . . . 305

3.2 Dialogue data . . . 306

4 Results and Analysis . . . 306

4.1 Reliability of Coding scheme . . . 307

4.2 Evaluating appropriateness . . . 307

4.3 Evaluating the Interactional Sequence . . . 310

5 Discussion . . . 310

6 Conclusions . . . 311

References . . . 311

Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances . . . 315

Kazunori Komatani, Kyoko Matsuyama, Ryu Takeda, Tetsuya Ogata, and Hiroshi G. Okuno

1 Introduction . . . 315

2 Enumeration Subdialogue using Utterance Timing . . . 317

2.1 Interpretation using Utterance Timing . . . 317

2.2 Switching into Enumeration Subdialogue . . . 318

3 System for Experiment . . . 319

4 Dialogue Experiment . . . 321

4.1 Experimental Set up and Condition . . . 321

4.2 Experimental Results . . . 323

5 Conclusion . . . 324

References . . . 325

How context determines perceived quality and modality choice. Secondary task paradigm applied to the evaluation of multimodal interfaces. . . 327

Ina Wechsung, Robert Schleicher and Sebastian Moller

1 Introduction . . . 327

2 Related Work . . . 328

3 Method . . . 331

4 Results . . . 334

5 Discussion and Conclusion . . . 337

References . . . 338

Part VII Prototypes and Products

Design and Implementation of a Toolkit for Evaluation of Spoken Dialogue Systems Designed for AmI Environments . . . 343

Nieves Abalos, Gonzalo Espejo, Ramon Lopez-Cozar, Zoraida Callejas and David Griol

1 Introduction . . . 343

2 The toolkit . . . 344

2.1 Automatic orthographic transcriber (AOT) . . . 345

3 Mayordomo system . . . 348

3.1 The Mayordomo corpus . . . 349

3.2 Interaction with the user simulator . . . 350

4 Experiments . . . 351

5 Conclusions and future work . . . 353

References . . . 354

A Dialogue System for Conversational NPCs. . . 357

Tina Kluwer, Peter Adolphs, Feiyu Xu and Hans Uszkoreit

1 Introduction . . . 357

2 NPCs in the Virtual World . . . 358

3 The Dialogue System Architecture . . . 359

4 Input Analysis and Interpretation . . . 360

5 Dialogue Flow. . . 361

6 Conclusion . . . 362

References . . . 362

Embedded Conversational Engine for Natural Language Interaction in Spanish . . . 365

Marcos Santos-Perez, Eva Gonzalez-Parada and Jose Manuel Cano-Garcia

1 Introduction . . . 365

2 Conversational Agents Overview . . . 366

3 Conversational Engine . . . 367

3.1 Lemmatizer . . . 367

3.2 Object-oriented DataBase . . . 370

4 Test Environment . . . 371

5 Tests and Results . . . 371

6 Conclusion . . . 372

7 Acknowledgments . . . 372

References . . . 373

Adding Speech to a Robotics Simulator . . . 375

Graham Wilcock and Kristiina Jokinen

1 Introduction . . . 375

2 Pyro Robotics . . . 377

3 Spoken Interaction . . . 378

4 Spoken Dialogues . . . 379

5 Further Work . . . 380

References . . . 380

Index . . . 381