Speech Segmentation & Dialog Systems

Notes:

Speech segmentation is the process of identifying and separating different units of speech, such as words, phrases, or sentences, within an audio signal. In the context of dialog systems, speech segmentation is used to extract meaning and structure from spoken language, and to identify the relevant units of speech that need to be processed and analyzed.

There are several approaches to speech segmentation, including rule-based methods, machine learning-based methods, and hybrid methods that combine both approaches. Rule-based methods rely on a set of predefined rules or heuristics to identify different units of speech, while machine learning-based methods use algorithms and data to learn patterns in speech and identify units of speech based on these patterns.

In dialog systems, speech segmentation is used to identify the relevant units of speech that need to be processed and analyzed in order to understand and respond to user input. For example, a dialog system might use speech segmentation to identify the words and phrases spoken by a user, and then use natural language processing (NLP) techniques to analyze the meaning and context of these units of speech.

Speech segmentation is an important component of dialog systems, as it allows the system to understand and interpret spoken language in a more accurate and natural way. By identifying and separating different units of speech, a dialog system can more easily extract meaning and context from spoken language, and provide more relevant and appropriate responses to user input.

Wikipedia:

Speech segmentation

See also:

Word Segmentation & Dialog Systems 2017

Syll-o-matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables
N Obin, F Lamare, A Roebel – Acoustics, Speech and Signal …, 2013 – ieeexplore.ieee.org
… interaction, spoken dialogue systems). Consequently, research has mostly focus on the study of phoneme or word recognition which has lead to the development of well-established speech recognition systems (HTK [1], SPHINX [2], HTS [3]). Speech segmentation systems are …

An overview of research trends in CogInfoCom
P Baranyi, A Csapo, P Varlaki – Intelligent Engineering Systems …, 2014 – ieeexplore.ieee.org
… of agreement between interlocutors [10], [11] • non-verbal cues – eg for turn-taking management [12], or for feedback in augmented dialogue systems [13] • conversational … 22] G. Kiss, D. Sztaho, and K. Vicsi, “Language independent automatic speech segmentation into phoneme …

Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014, Proceedings
P Sojka, A Horák, I Kope?ek, K Pala – 2014 – books.google.com
… One of the ambitions of the conference is, as its title says, not only to deal with dialog systems as such, but also to contribute to improving dialog between researchers in the two areas of NLP, ie, between text and speech people …

Multi-room speech activity detection using a distributed microphone network in domestic environments
P Giannoulis, A Brutti, M Matassoni… – … 2015 23rd European, 2015 – ieeexplore.ieee.org
… to far-field ASR in multi-room environ- ments, specifically showing the crucial impact of SAD in a multi- microphone spoken dialogue system … Q = [ q1 ,…,qT ] that maximizes probability p (Q |X) . The resulting state sequence represents a speech/non- speech segmentation for the …

Toward incremental dialogue act segmentation in fast-paced interactive dialogue systems
R Manuvinakurike, M Paetzel, C Qu… – Proceedings of the 17th …, 2016 – aclweb.org
… It’s important to allow users to speak naturally to spoken dialogue systems … 1In Manuvinakurike et al. (2016), we describe a related application of incremental speech segmentation in a variant rapid dialogue game with a different corpus …

Speech emotion recognition: a review
RB Lanjewar, DS Chaudhari – International Journal of Innovative …, 2013 – Citeseer
… This can also be used in the spoken dialogue system eg at call centre applications where the support staff can handle … PK Ghosh, A. Sarkar and TV Sreenivas, “ALCR and ESTL: Novel Temporal Features and their Application to Speech Segmentation” International Conference …

Disfluency and laughter annotation in a light-weight dialogue mark-up protocol
J Hough, L de Ruiter, S Betz, D Schlangen – 2015 – pub.uni-bielefeld.de
… Bielefeld University, *Dialogue Systems Group, **Workgroup Phonetics and Phonology julian.hough@uni-bielefeld.de … The lower agreement for laughed speech segmentation is not detrimental, as it is still good enough to provide search terms for subsequent stand-off …

Speech recognition technology: a survey on Indian languages
G Hemakumar, P Punitha – International Journal of Information …, 2013 – academia.edu
… The segmentation of continous speech signal analysis methods are Table 1. Showing techniques of speech segmentation Sl. No Domains Parameters Tasks 1 Time Energy Syllabification Endpoint detection Zero-Crossing Endpoint detection Fundamental Intonation …

Generation of suprasegmental information for speech using a recurrent neural network and binary gravitational search algorithm for feature selection
M Sheikhan – Applied intelligence, 2014 – Springer
… Prosodic features can also be used for syntactic/semantic parsing [15], dialogue processing (for example, recognition of dialogue acts and dialogue act boundaries) [16, 17], automatic dialogue systems [18, 19], language identification and speaker adaptation (to consider …

Using acoustic paralinguistic information to assess the interaction quality in speech-based systems for elderly users
H Pérez-Espinosa, J Martínez-Miranda… – International Journal of …, 2017 – Elsevier
… et al., 2008) was created to study the differences between how younger and older users talk to a spoken dialogue system, and the … We divided this stage into four activities: phenomena selection, speech segmentation, acoustic characterization, and phenomena classification …

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
L van den Bosch – 2013 – books.google.com
… analysis (robust parsing) Table 1.2 Summary of STEVIN scientific priorities–application oriented Embedding HLTD Speech (III) 1. Information extraction from audio transcripts created by speech recognisers 2. Speaker accent and identity detection 3. Dialogue systems and Q&A …

Integration of temporal contextual information for robust acoustic recognition of bird species from real-field data
I Mporas, T Ganchev, O Kocsis… – … Journal of Intelligent …, 2013 – search.proquest.com
… include speech and audio signal processing, pattern recognition, automatic speech recognition, automatic speech segmentation and spoken … Her research interests include audio signal processing, dialogue systems, multimodal interfaces, human-computer interaction, and user …

Developing an HMM-based speech synthesis system for Malay: a comparison of iterative and isolated unit training
MB Mustafa, ZM Don, RN Ainon… – … on Information and …, 2014 – search.ieice.org
… 1. Introduction Speech technologies are now available for many languages, and they include a number of useful systems and interactive tools, such as automatic speech recognition (ASR), speech synthesis and spoken dialogue systems …

Romanian phonetic transcription dictionary for speeding up language technology development
J Domokos, O Buza, G Toderean – Language resources and evaluation, 2015 – Springer
… There is some recent research reported in the literature about using grapheme models instead of phoneme models for speech segmentation (Stan et al. 2012; Domokos et al. 2013) … Spontaneous speech recognition for Romanian in spoken dialogue systems …

Unsupervised Adaptation of ASR Systems Using Hybrid HMM/VQ Model
AA Babu, Y Ramadevi, AA Rao – Proceedings of the International …, 2014 – iaeng.org
… Feature (EEF) End point detection, and Speech segmentation Mel Frequency Cepstral Coefficient (MFCC) Perceptual Linear Prediction (PLP) Coefficients Cepstral mean subtraction (CMS) Feature extraction RASTA filtering …

A Review on Automatic Speech Recognition Architecture and Approaches
S Karpagavalli, E Chandra – International Journal of …, 2016 – pdfs.semanticscholar.org
… Synthesis, Speech Recognition, Speaker Recognition and Verification, Speech Enhancement, Speech Segmentation and Labeling (Transcription), Language Identification, Prosody, Attitude and Emotion recognition, Audio-Visual Signal Processing and Spoken Dialog Systems …

Speech-centric information processing: An optimization-oriented approach
X He, L Deng – Proceedings of the IEEE, 2013 – ieeexplore.ieee.org
… For in- stance, ASR and NLU subsystems in tandem form the SCIP system of SLU. When the output of SLU is further provided to a subsequent dialog control subsystem, a part of a spoken dialog system (open loop) is established … Page 5. spoken dialog systems …

Romanian language voice browsing for web applications using grapheme level acoustic modeling
J Domokos, L Sándor, O Buza… – Advanced Engineering …, 2013 – Trans Tech Publ
… recent research reported in the literature about using grapheme models instead of phoneme models for speech segmentation [10] … Popescu, A. Buzo, CS Petrea, D. Ghelmez-Hane?, Spontaneous Speech Recognition for Romanian in Spoken Dialogue Systems, in Proceedings …

Speech acts annotation of everyday conversations in the ORD ?orpus of spoken Russian
T Sherstinova – International Conference on Speech and Computer, 2016 – Springer
… linguistics and speech technologies (eg, by providing possibility to study linguistic properties and patterns of speech acts of different types, which may be used in elaboration of human-computer spoken dialogue systems, for speech … Speech segmentation is made in ELAN …

Self-talk discrimination in human–robot interaction situations for supporting social awareness
J Le Maitre, M Chetouani – International Journal of Social Robotics, 2013 – Springer
… Filled pauses were not annotated, as the dialog system processed them … For automatic estimation, we followed the framework described in Fig. 4. A Vocal Activity Detector (VAD), suitable for real-time detection in robotics [8], is employed for speech segmentation …

Analysis of emotional speech—A review
P Gangamohan, SR Kadiri… – Toward Robotic Socially …, 2016 – Springer
… Systems recognizing speaker’s emotions, and also responding expressively, are essential for natural interaction. Research on emotions in speech has applications in spoken dialogue systems, automated response systems, call centers, etc …

Speaker activity detection for distributed microphone systems in cars
T Matheja, M Buck, T Fingscheidt – Proceedings of the 6th Biennial …, 2013 – researchgate.net
… speech application 1. INTRODUCTION In speech communication systems in automotive environ- ments the interest in more comfort within hands-free tele- phony or speech dialog systems is increasing. Distributed and speaker …

Information fusion in automatic user satisfaction analysis in call center
J Sun, W Xu, Y Yan, C Wang, Z Ren… – … (IHMSC), 2016 8th …, 2016 – ieeexplore.ieee.org
… of dialogue into positive, negative and neutral for different corpus in call center, and comparing automatic speech segmentation and manual … 2] work, which presents a method for automatic prediction of the user’s mental states in a human-machine spoken dialogue system …

Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding.
S Stehwien, NT Vu – INTERSPEECH, 2016 – isca-speech.org
… extracted features that capture eg duration, fundamental frequency and voice quality and obtained promis- ing results for automatic speech segmentation … The speech files consist of single utter- ances by speakers requesting flight information from a dialog system, for example …

Code-Switching event detection based on delta-BIC using phonetic eigenvoice models.
WB Liang, CH Wu, CS Hsu – INTERSPEECH, 2013 – pdfs.semanticscholar.org
… nology in a spoken dialogue system … Recently, speech segmen- tation approaches such as ?BIC [17] and MDL [18] are com- monly employed for speech segmentation. A detailed compar- ison was addressed in [19] and the delta-BIC is a widely used approach …

Data Driven Methods for Adaptation of ASR Systems
A AMARENDRA BABU, Y RAMADEVI… – … : Special Issue for the …, 2015 – World Scientific
… End point detection, and Speech segmentation … 14. Sungjin Lee and Maxine Eskenazi, “An Unsupervised Approach to User Simulation: Toward Self-Improving Dialog Systems”, Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue …

Evaluation of Hidden Semi-Markov Models Training Methods for Greek Emotional Text-to-Speech Synthesis
A Lazaridis, I Mporas – International Journal of Information …, 2013 – researchgate.net
… Over the last decades, the increasing interest in human-computer interaction and spoken dialog systems raised the need for … interests include speech and audio signal processing, pattern recognition, automatic speech recognition, automatic speech segmentation and spoken …

Processing units in conversation: A comparative study of French and Mandarin data
L Prévot, SC Tseng, K Peshkov… – Language and …, 2015 – journals.sagepub.com
Human spoken language production is directed towards communication delivering comprehensible information to recipients. Speech segmentation into small units eff…

Integrating natural language processing with image document analysis: what we learned from two real-world applications
J Chen, H Cao, P Natarajan – International Journal on Document Analysis …, 2015 – Springer
… other. A global inference on labels, such as sequential labeling, can reduce errors from predicting consecutive sentence boundaries. In addition, CRFs have proved quite successful in speech segmentation [27, 38]. For OOV …

Speech Dialog as a Part of Interactive “Human-Machine” Systems
R Potapova – International Conference on Interactive Collaborative …, 2016 – Springer
… Speech recognition Dialog system Acoustic features Formant tracking analysis Prosody. Download fulltext PDF … information requires a series of studies aimed at solving the following tasks: identification of prosodic characteristics of continuous speech segmentation on the …

Recognition of paralinguistic information in spoken dialogue systems for elderly people
H Pérez-Espinosa, J Martínez-Miranda – Mexican International Conference …, 2015 – Springer
… It contains recordings of children, non-natives and elderly people interacting in Dutch with a spoken dialogue system … models for paralinguistic phenomena recognition into interactive systems, it is necessary the development of an automatic speech segmentation method …

Computer-Assisted Language
T Kawahara, N Minematsu, S Aspect, SRT TK, PSM NM… – Citeseer
… 12 spectrum, pitch, power • Feature normalization required for objective comparison with model speaker • (Constrained) speech recognition (ASR) • Speech segmentation-alignment • Error detection • Scoring • Need to model non-native speech and handle erroneous input • Not …

Gamification:“The System Beats Human Huge Critical Thinking”
D Asha, P Mayilvahanan – ijeter.everscience.org
… 2.16 Speech segmentation: Given a sound clip of a person or people speaking, separate it into words … Information Retrieval Web search system, or a dialogue system. • Report Generation Generation of reports such as weather reports …

I Text
M Sergot – fi.muni.cz
… 195 Tomáš Dubeda, Jirí Hanika (Charles University, Prague) Automatic Speech Segmentation with the Application of the Czech TTS System … A Syntactical Model of Prosody as an Aid to Spoken Dialogue Systems in Italian Language …

Automatic Phonetic Segmentation Using the Kaldi Toolkit
J Matoušek, M Klíma – International Conference on Text, Speech, and …, 2017 – Springer
… 1283–1287. Singapore (2014)Google Scholar. 2. Brognaux, S., Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches … Plátek, O., Jur?í?ek, F.: Integration of an on-line Kaldi speech recogniser to the Alex dialogue systems framework …

A voice command detection system for aerospace applications
S Tabibian – International Journal of Speech Technology, 2017 – Springer
… Four major applications of keyword spotting are keyword monitoring, audio document indexing, command-controlled devices and dialogue systems … The main goal of speech segmentation is determining the boundary between different segments of utterances …

Unknown Word Detection Based on Event-Related Brain Desynchronization Responses
T Sasakura, S Sakti, G Neubig, T Toda… – … Dialog Systems and …, 2015 – Springer
… Future work includes improvement of the performance of the classifier, experiments in an environment like a real conversation, and application to a multi-modal dialog system. Notes. Acknowledgements … speech segmentation in Japanese …

Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries
A Dey, S Shahnawazuddin, KT Deepak… – … (NCC), 2016 Twenty …, 2016 – ieeexplore.ieee.org
… 2687–2690. [4] JR Glass, “Challanges for spoken dialogue systems,” in Proc. IEEE ASRU workshop, 1999 … 30–42, January 2012. [8] KT Deepak, BD Sarma, and SRM Prasanna, “Foreground speech segmentation using zero frequency filtered signal,” in Proc. Interspeech …

Automated Speech Recognition System–A Literature Review
M Manjutha, J Gracy, P Subashini… – COMPUTATIONAL … – researchgate.net
… Some of the major growing applications are Language Identification, Speech Enhancement, Spoken Dialog System, Speaker Recognition and Verification, Speech Coding, Emotion and Attitude Recognition, Speech Segmentation and Labeling, Speech Recognition, Prosody …

Detecting Breathing Sounds in Realistic Japanese Telephone Conversations and Its Application to Automatic Speech Recognition
T Fukuda, O Ichikawa, M Nishimura – Speech Communication, 2018 – Elsevier
… The speech segmentation based on accurate breath-event detection provided a 3.8% relative error reduction in automatic speech recognition (ASR) … phones (Sainath et al., 2017), a voice control for car navigation systems (Wang et al., 2008), and a spoken dialog system for use …

Listening’To Dyslexic Children’s Reading: The Transcription And Segmentation Accuracy For ASR
H Husni, NNN Him, MM Radi, Y Yusof… – Journal of …, 2017 – journal.utem.edu.my
… [18] F. Cangemi, F., Cutugno, B. Ludusan, D. Seppi, CD Van, “Automatic speech segmentation for Italian (ASSI … [41] T. Baumann, C. Kennington, J. Hough, D. Schlangen, “Recognising conversational speech: What an incremental ASR should do for a dialogue system and how to …

Realization of Minimum Discursive Units Segmentation of Arab Oral Utterances.
C Lhioui, A Zouaghi, M Zrigui – Int. J. Comput. Linguistics Appl., 2016 – researchgate.net
… This greatly complicates the task of discursive segmentation [10]. 4.2 Specificities of the Oral One of the difficulties sources, when using the UDM transcribed speech segmentation, is linked to the particularity of the oral modality and the spontaneous nature of the interaction …

Emotion Identification from Spontaneous Communication
F Getahun, M Kebede – … & Internet-Based Systems (SITIS), 2016 …, 2016 – ieeexplore.ieee.org
… b) Induced Emotions In [17], Callejas et al. proposed a method for predicting the user mental state for the development of spoken dialogue systems. A corpus of spontaneous Spanish speech dialogue was acquired to recognize the emotional state of a user … Segmentation …

Novel alignment method for DNN TTS training using HMM synthesis models
S Suzi?, T Deli?, D Pekar… – Intelligent Systems and …, 2017 – ieeexplore.ieee.org
… The research was conducted within the project “Development of Dialogue Systems for Serbian and Other South Slavic Languages” (TR32035), financed by … 416-421, 2012 [13] F. Malfrère and T. Dutoit, “High quality speech synthesis for phonetic speech segmentation,” In Proc …

SPPAS tutorial: Methodology and software for the semi-automatic annotation of speech
B Bigi – 2015 – lpl-aix.fr
… It is able to produce automatically speech segmentation annotations from a recorded speech sound and its transcription. Some special features are also offered for managing corpora of annotated files. 2015-10-12 25 Page 26 … is also called Silence/Speech segmentation …

15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) Celebrating the Diversity of Spoken Languages
H Li, P Ching – 2014 – toc.proceedings.com
… Misu, Teruhisa: “Crowdsourcing for situated dialog systems in a moving car”, 125-129 … Robichaud, Jean-Philippe / Crook, Paul A. / Xu, Puyang / Khan, Omar Zia / Sarikaya, Ruhi: “Hypotheses ranking for robust domain classification and tracking in dialogue systems”, 145-149 …

State of Research of Speech Recognition
M Sarma, KK Sarma – … -Based Speech Segmentation using Hybrid Soft …, 2014 – Springer
… Phoneme-Based Speech Segmentation using Hybrid Soft Computing Framework pp 95-113 | Cite as. State of Research of Speech Recognition … 17. In 2013, a work has been reported by Mohan et al. [69] where a spoken dialog system is designed to use in agricultural …

Improvements in IITG Assamese Spoken Query System: Background Noise Suppression and Alternate Acoustic Modeling
S Shahnawazuddin, D Thotappa, A Dey… – Journal of Signal …, 2017 – Springer
… 3. Deepak, KT, Sarma, BD, & Prasanna, SRM (2012). Foreground speech segmentation using zero frequency filtered signal. In Proc. Interspeech.Google Scholar. 4. Glass, JR (1999). Challanges for spoken dialogue systems. In Proc …

Tree-Structured Noise-Adapted HMM Modeling for Piecewise Linear
Z Zhang, K Otsuji, S Furui – t2r2.star.titech.ac.jp
… The proposed methods have been evaluated using a dialogue system, in which two kinds of real noise were added to speech at … the variation of noises for both training and testing, improving the tree structure of noise-adapted HMM, and automatic noise/speech segmentation …

Association (INTERSPEECH 2013)
C Fougeron, F Pellegrino – 2013 – pdfs.semanticscholar.org
… 388 Jani Nurminen, Hanna Silen, MoncefGabbouj Avatar Therapy: An Audio-Visual Dialogue System for Treating Auditory Hallucinations 392 … 465 Juho Knuuttila, Okko Rasanen, Unto K. Laine Particle Swarm Optimisation of Spoken Dialogue System Strategies 470 …

Proceedings of 14th Annual Conference of the International Speech Communication Association
F Bimbot, C Fougeron, F Pellegrino… – 14th Annual Conference …, 2013 – repo.pw.edu.pl
… 1364 Avatar Therapy: an audio-visual dialogue system for treating auditory hallucinations Mark Huckvale, Julian Leff, Geoff Williams … 986 Particle Swarm Optimisation of Spoken Dialogue System Strategies Lucie Daubigney, Matthieu Geist, Olivier Pietquin …

Agreement and Disagreement Utterance Detection in Conversational Speech by Extracting and Integrating Local Features
A Ando, T Asami, M Okamoto, H Masataki… – … Annual Conference of …, 2015 – isca-speech.org
… After speech segmentation, we were left with 2987 utterances occupying 7.2 hours … 2477–2480, 2003. [12] S. Fujie, D. Yagi, H. Kikuchi and T. Kobayashi, “Prosody based Attitude Recognition with Feature Selection and Its Application to Spoken Dialog System as Para-Linguistic …

Fuzzy Logic in Speech Technology-Introductory and Overviewing Glimpses
HN Teodorescu – Fifty Years of Fuzzy Logic and its Applications, 2015 – Springer
… Speech vs. non-speech segmentation play an essential role in optimizing the communication bandwidth by suppressing transmission of un-voiced segments and in reducing the overall noise level (Bouquin-Jeannes and Faucon 1994) …

Prosodic classification of discourse markers
V Cabarrão, H Moniz, J Ferreira, F Batista, I Trancoso… – markers, 2015 – inesc-id.pt
… to include discourse markers in the language models for EP already trained with other structural metadata events, which will result in enriched automatic transcriptions, and to integrate the classifiers in dialogue systems … Speech Segmentation and Spoken Document Processing …

Implementation of robot journalism by programming custombot using tokenization and custom tagging
N Lee, K Kim, T Yoon – Advanced Communication Technology …, 2017 – ieeexplore.ieee.org
… During this process, Speech Segmentation, which divide the whole audio clip into word segments, is fundamental [14]. Semantic Role Labelling is also used for Dialogue System; it is a task which identifies sentence elements such as subject and object in each sentence [15] …

Fundamentals Tools of Modern Multilingual Speech Processing Technology in “Urdu” langauage Speech Processing
MW Ashfaque, QN Naveed, SS Banu, QSSA Ahmed – pdfs.semanticscholar.org
… tools are as follows • Tone analysis • Intonation analysis • Stress analysis • 4 Pause analysis 3.2 Speech segmentation Syllable-unit … According to past literature reviews, three applications including spoken dialogue systems, speech translation, and speaker recognition have …

Voice Analytics Process
SK Kopparapu – Non-Linguistic Analysis of Call Center Conversations, 2015 – Springer
… This is generally called speaker segmentation. 2.3 Agent Customer Speech Segmentation. A typical audio conversation available for processing and hence analysis is a conversation in natural language involving both the voice agent and the customer …

Crowdsourcing for speech processing: Applications to data collection, transcription and assessment
M Eskenazi, GA Levow, H Meng, G Parent… – 2013 – books.google.com
… for TTS: What Worked and What Did Not 7.4 Related Work: Detecting and Preventing Spamming 7.5 Our Experiences: Detecting and Preventing Spamming 7.6 Conclusions and Discussion References Chapter 8: Crowdsourcing for Spoken Dialog System Evaluation 8.1 …

Introduction
P Spyns – Essential Speech and Language Technology for Dutch, 2013 – Springer
… 3. “CommuneConnect!” (“GemeenteConnect”) is a phone dialogue system that allows for … 1.1) combines speech segmentation, speech recognition, text alignment and sentence condensation techniques to implement a less labour intensive semi-automatic tool to produce Dutch …

Soft-Computational Techniques and Spectro-Temporal Features for Telephonic Speech Recognition: An Overview and Review of
M Sharma, KK Sarma – Handbook of Research on Advanced …, 2015 – books.google.com
… The evaluation results reported by the authors here indicate high recognition accuracy up to 95% which makes the proposed solution a feasible one with addition to the existing spoken dialogue systems such as voice banking applications, call routes, voice portals, etc …

Automatic quality estimation for ASR system combination
S Jalalvand, M Negri, D Falavigna, M Matassoni… – Computer Speech & …, 2018 – Elsevier
… Voice search engines, voice question answering, broadcast news transcriptions, video/TV programs subtitling, meeting transcriptions and spoken. dialog systems are just some of the many applications involving ASR technology …

Quality assessment for speaker diarization and its application in speaker characterization
C Vaquero, A Ortega, A Miguel… – IEEE Transactions on …, 2013 – ieeexplore.ieee.org
… according to the speech/non-speech labels. We assume that the speech/non-speech labels are obtained previously using a VAD or speech/non-speech segmentation strategy. 5) Resegmentation: Since the compensated speaker …

Code-switching event detection by using a latent language space model and the delta-Bayesian information criterion
CH Wu, HP Shen, CS Hsu – IEEE/ACM Transactions on Audio, Speech …, 2015 – dl.acm.org
… indispensable in human–machine communication applications, particularly in multilingual spoken dialog systems … Speech segmentation approaches, such as the Akaike infor- mation criterion [22], Bayesian information criterion (BIC), BIC [23], and minimum description length …

Interspeech 2013: Program
W Session – interspeech2013.org
… Page 8. 1364 Avatar Therapy: an audio-visual dialogue system for treating auditory hallucinations Mark Huckvale, Julian Leff, Geoff Williams … 986 Particle Swarm Optimisation of Spoken Dialogue System Strategies Lucie Daubigney, Matthieu Geist, Olivier Pietquin …

Speech perception by humans and machines
MH Davis, O Scharenborg – Speech perception and spoken word …, 2017 – books.google.com
… contained in the speech waveform. The most probable word sequence can then be transcribed, used to drive a dialogue system, or for other purposes (see Young, 1996, for a more detailed review). Most machine speech recognition …

Theory and Applications of Natural Language Processing
G Hirst, E Hovy, M Johnson – 2013 – Springer
… analysis (robust parsing) Table 1.2 Summary of STEVIN scientific priorities–application oriented Embedding HLTD Speech (III) 1. Information extraction from audio transcripts created by speech recognisers 2. Speaker accent and identity detection 3. Dialogue systems and Q&A …

Synthesis and evaluation of conversational characteristics in speech synthesis
S Andersson – 2013 – era.lib.ed.ac.uk
… literature as eg speech acts (Searle, 1969), turns (Sacks et al., 1974), and more recent derivations of turns or speech acts as dialogue acts in speech synthesis and spoken dialogue systems (Campbell, 2005; Traum et al., 2008; Bunt et al., 2010). The main …

I. EARNED DEGREES
CHUI LEE – 2013 – pwp.gatech.edu
… Recognition,” IEEE Signal Processing Letters, Vol. 1, No. 8, pp. 124-125, August 1994. [25] C.-H. Lee, “Stochastic Modeling in Spoken Dialogue System Design,” Speech Communication, Vol. 15, pp. 311-322, Nov. 1994. [26] C.-S. Liu, C …

Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model
PJ Lou, M Johnson – Proceedings of the 55th Annual Meeting of the …, 2017 – aclweb.org
… Moreover, disfluen- cies pose a major challenge to natural language processing tasks, such as dialogue systems, that rely on speech transcripts (Ostendorf et al., 2008) … 2008. Speech segmentation and its impact on spoken document processing …

Computational Models for Analyzing Affective Behaviors and Personality from Speech and Text
F Alam – 2016 – eprints-phd.biblio.unitn.it
… 55 3.1.1 Annotation of Affective Behavior . . . . . 56 3.1.2 Transcriptions . . . . . 70 3.1.3 Speech vs Non-Speech Segmentation . . . . . 71 3.1.4 Corpus Analysis . . . . . 72 3.1.4.1 Corpus Summary . . . . . 72 …

An information-extraction approach to speech processing: Analysis, detection, verification, and recognition
CH Lee, SM Siniscalchi – Proceedings of the IEEE, 2013 – ieeexplore.ieee.org
Page 1. INVITED PAPER An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition This paper presents an integrated detection and verification approach to information extraction …

Application of wavelets in speech processing
MH Farouk – 2014 – Springer
… Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use …

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
J Tejedor, DT Toledano, P Lopez-Otero… – EURASIP Journal on …, 2015 – Springer
… The evaluation campaigns provide an objective mechanism to compare different systems and are a powerful way to promote research on different speech technologies (eg, speech segmentation [34], speaker diarization [35], language recognition [36], query-by-example spoken …

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
M Palmer, R Hwa, S Riedel – Proceedings of the 2017 Conference on …, 2017 – aclweb.org
Page 1. EMNLP 2017 The Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference September 9-11, 2017 Copenhagen, Denmark Page 2. c?2017 The Association for Computational Linguistics …

VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices
J Qian, H Du, J Hou, L Chen, T Jung, XY Li… – arXiv preprint arXiv …, 2017 – arxiv.org
… warping) based method. Keyword spotting identifies keywords directly from the utterances. It is usually used for keyword monitoring, speech document indexing, and automated dialogue systems [11], [37]. We design an adaptive …

Optimized dynamic programming search for automatic speech recognition on a Graphics Processing Unit (GPU) platform using Compute Unified Device Architecture …
BB Letswamotse – 2014 – dspace.nwu.ac.za
Page 1. 11111111111111111 11111111111111111111 IIIII IIIII 1111111111111 060045701N North-West University Mafikeng Campus Library NO!ftTH·WEST UNIVERSITY YUNIBESITI YA BOKONE-BOPHIRIMA NOORDVVES-UNIVEtRSITEIT …

Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems’ hypotheses
S Li, Y Akita, T Kawahara – IEEE/ACM Transactions on Audio, Speech …, 2016 – dl.acm.org
… Fig. 1. The process flow is as follows. A. Process Flow 1) Preprocessing and Hypothesis Generation: For pre- processing, we first conduct speech segmentation to the utter- TABLE V CATEGORY OF ALIGNMENT PATTERNS …

Natural Language Understanding and Prediction Technologies
N Duta – ijcai-15.org
… mixture with about 10,000 Gaussians) – Tied Mixture Weights Page 12. 12 IJCAI 2015 Tutorial Large vocabulary continuous speech recognition: the BBN EARS system Feature extractor Speech segmentation (silence detection) Acoustic features Speech segments …

Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition
DIC Gollan, IH Ney, PDDL Lamel – publications.rwth-aachen.de
Page 1. Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition Von der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung des akademischen Grades eines …

Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training
S Li – 2016 – repository.kulib.kyoto-u.ac.jp
Page 1. Title Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training( Dissertation_?? ) Author(s) Li, Sheng Citation Kyoto University (????) Issue Date 2016-03-23 URL https://dx.doi.org/10.14989/doctor.k19849 …

Romanian phonetic transcription dictionary
J Domokos, O Buza, G Toderean – researchgate.net
… In this way it is possible to skip the grapheme-to-phoneme con- version task. There is some recent research reported in the literature about using grapheme models instead of phoneme models for speech segmentation [20], [21] …

Confidence Measures for Automatic and Interactive Speech Recognition
IS Cortina – 2016 – riunet.upv.es
Page 1. i i i i Confidence Measures for Automatic and Interactive Speech Recognition ? January, 2016 ? Ph.D. Dissertation by Isaías Sánchez Cortina Advisors: Dr. Alfons Juan i Ciscar Dr. J. Alberto Sanchis Navarro Page 2. Page 3. Resum …

MULTI-LINGUAL MARKET INFORMATION INTERACTIVE VOICE RESPONSE SYSTEM
H ENDALE – 2014 – etd.aau.edu.et
Page 1. ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES COLLEGE OF NATURAL SCIENCES DEPARTMENT OF COMPUTER SCIENCE MULTI-LINGUAL MARKET INFORMATION INTERACTIVE VOICE RESPONSE SYSTEM HONELET ENDALE …

Ultra low bit-rate speech coding
V Ramasubramanian, H Doddala – 2015 – Springer
… Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in informa- tion security for automated speech, forensic speaker recognition, use …

Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training
S Li, Y Akita, T Kawahara – IEICE TRANSACTIONS on Information …, 2015 – search.ieice.org
… We first conduct speech segmentation to the utterance unit based on the BIC (Bayesian Information Criterion) method [23] and speaker … He has published more than 250 technical papers on speech recognition, spoken language processing, and spoken dialogue systems …

Discriminative Acoustic Features for Deployable Speech Recognition
A Faria – 2016 – eecs.berkeley.edu
Page 1. Discriminative Acoustic Features for Deployable Speech Recognition Arlo Faria Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-199 http://www2 …

Referential grounding towards mediating shared perceptual basis in situated dialogue
C Liu – 2015 – search.proquest.com
… The collaborative model and the concept of grounding have motivated previous work on spoken dialogue systems [45], embodied … They proposed an information theoretic framework which could learn models for speech segmentation, word discovery and visual categorization …

Non-Linguistic Analysis of Call Center Conversations
SK Kopparapu – 2015 – Springer
… 11 2.2 Music Voice Separation . . . . . 13 2.3 Agent Customer Speech Segmentation . . . . . 17 2.4 Speech to Text Conversion. . . . . 21 References …

Computational modeling of turn-taking dynamics in spoken conversations
SA Chowdhury – 2017 – eprints-phd.biblio.unitn.it
Page 1. PhD Dissertation International Doctorate School in Information and Communication Technologies DISI – University of Trento COMPUTATIONAL MODELING OF TURN-TAKING DYNAMICS IN SPOKEN CONVERSATIONS Shammur Absar Chowdhury Advisor: Prof …

Towards an Interactive Human-Robot Relationship: Developing a Customized Robot Behavior to Human Profile
A Aly – 2014 – pastel.archives-ouvertes.fr
… 97 4.5 Multimodaldatasegmentation . . . . . 98 12 Page 14. 4.5.1 GestureSegmentation . . . . . 99 4.5.2 Speech Segmentation . . . . . 101 4.6 Multimodal data characteristics validation . . . . . 101 …

Domain-and Language-adaptable Natural Language Controlling Framework
P Barabás, I Juhász – 2013 – hjphd.iit.uni-miskolc.hu
… vi QLF Quasi-Logical Form RDF Resource Description Framework SDK Software Development Kit SDS Speech Dialog System SNLP Stanford NLP SNLPG Stanford Natutal Language Processing Group SPO Subject-Predicate-Object SRM Semantic Representation Model TAG …

DOMAIN-AND LANGUAGE-ADAPTIVE NATURAL LANGUAGE CONTROLLING FRAMEWORK
P Barabás – 2013 – 193.6.1.94
… vi OWL Web Ontology Langauge POI Point Of Interests POS Part-Of-Speech QLF Quasi-Logical Form RDF Resource Description Framework SDK Software Development Kit SDS Speech Dialog System SNLP Stanford NLP SNLPG Stanford Natutal Language Processing Group …

Machine learning for gesture recognition from videos
BG Gebre – 2015 – repository.ubn.ru.nl
… Gesture stroke detection is one of the main preprocessing tasks in gesture stud- ies. The task can be likened to speech segmentation or word tokenization. This study contributes to the literature by proposing an adaptive user-controlled solution to gesture stroke detection …

Automatic text and speech processing for the detection of dementia
K Fraser – 2016 – search.proquest.com
Automatic text and speech processing for the detection of dementia. Abstract. Dementia is a gradual cognitive decline that typically occurs as a consequence of neurodegenerative disease, and can result in language deficits (ie, aphasia) …

NLTK essentials
N Hardeniya – 2015 – books.google.com
… translation 63 Statistical machine translation 65 Information retrieval 65 Boolean retrieval 66 Vector space model 66 The probabilistic model 67 Speech recognition 68 Text classification 68 Information extraction 70 Question answering systems 70 Dialog systems 71 Word …

Thesis Material-Rasmus Dall
R Dall – 2017 – datashare.is.ed.ac.uk
Page 1. Statistical Parametric Speech Synthesis Using Conversational Data and Phenomena Rasmus Dall T H E U NIVER S I T Y O F E DI NBU R G H Doctor of Philosophy Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 2017 …

4 Driver mirror-checking action detection
N Li, C Busso – Vehicle Systems and Driver Modelling: DSP …, 2017 – books.google.com
Page 93. Nanxiang Li and Carlos Busso 4 Driver mirror-checking action detection Using multi-modal signals Nanxiang Li, Carlos Busso: The University of Texas at Dallas, Richardson, TX 75080, USA. e-mail:{nxl056000, busso}@ utdallas …

Sequential decisions and predictions in natural language processing
H He – 2016 – search.proquest.com
… Similarly, machine translation has largely focused on batch translation at the sentence level. Although there exists work on translation at the sub-sentence level based on speech segmentation, little work exploits linguistic knowledge and strategies of human interpreters …

Linguistic Linked Open Data
D Trandab??, D Gîfu – 2016 – Springer
Page 1. 123 Diana Trandab?? Daniela Gîfu (Eds.) 12th EUROLAN 2015 Summer School and RUMOUR 2015 Workshop Sibiu, Romania, July 13–25, 2015 Revised Selected Papers Linguistic Linked Open Data Communications in Computer and Information Science 588 Page 2 …