Speaker Recognition & Dialog Systems

Notes:

This text discusses the use of spoken language processing technology in various applications, including speech translation, multilingual speech recognition, user interaction with spoken dialog systems, and speaker recognition. The text also mentions the importance of addressing the presence of multiple languages in input for certain applications, and the impact of differences between adult and child speakers on the performance of speaker recognition tasks. The text also mentions the use of chatbots and voice interfaces, such as automatic speech recognition and artificial dialog systems, in businesses and the public sector. The text also mentions the use of speech processing technology for various purposes, including keyword monitoring, audio document indexing, command control devices, and routing multimedia files and streams based on their content. The text also mentions the use of machine learning techniques, such as artificial neural networks and support vector machines, in speech processing and the importance of selecting the appropriate classifier based on the properties of the dataset.

Automated speaker recognition is the process of using a computer program to identify and verify the identity of a speaker based on the characteristics of their voice.
Automatic speaker recognition is similar to automated speaker recognition, but it refers to the process of automatically identifying and verifying a speaker without requiring any human intervention.
Intelligent voice system is a computer system that is designed to recognize and respond to spoken commands or questions.
Speaker characterisation (or characterization) is the process of analyzing the unique characteristics of a speaker’s voice in order to identify or verify their identity.
Speaker diarisation (or diarization) is the process of automatically determining which speaker is speaking in an audio or video recording.
Speaker identification is the process of using a speaker recognition system to identify the speaker based on their voice.
Speaker recognition module is a software component that is designed to perform speaker recognition tasks, such as identification or verification.
Voice biometrics is the use of voice recognition technology to identify or verify the identity of an individual based on the unique characteristics of their voice.

Resources:

cris.ai .. customize microsoft’s speech-to-text engine for your application
matlab-recognition-code.com .. matlab source code for for speaker identification based on neural networks
nist.gov/itl/iad/mig/speaker-recognition .. nist multimodal information group speaker recognition evaluation (sre)
speech.sri.com/projects/sitw .. speakers in the wild – speaker recognition challenge – sri international
vidispine.com .. api for complex data-driven, cloud-based video content management solutions

Wikipedia:

References:

Speech recognition in a dialog system: from conventional to deep processing
A Becerra, JI de la Rosa, E González – Multimedia Tools and Applications, 2018 – Springer
… Section 3 discusses the fundamentals about the components of a spoken dialog system … For instance, Nakagawa et al. [54] defined a text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian mixture model (GMM) with syllable-based …

Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture
J Ahmad, M Sajjad, S Rho, S Kwon, MY Lee… – Multimedia Tools and …, 2018 – Springer
… growing applications in communication, human-computer interaction (HCI), telephone speech forensic analysis, and natural language dialog systems … For instance, accurate gender information can improve speaker recognition systems by reducing the search space to only a …

Deeply fused speaker embeddings for text-independent speaker verification
G Bhattacharya, J Alam, V Gupta… – Proc …, 2018 – pdfs.semanticscholar.org
… These features allow RNNs to capture temporal dependencies in the data. Consequently, RNNs have been used extensively in a variety of machine learning problems ranging from translation and dialogue systems to speech and speaker recognition …

DNN-HMM based automatic speech recognition for HRI scenarios
J Novoa, J Wuth, JP Escudero, J Fredes… – Proceedings of the …, 2018 – dl.acm.org
… unexpected distortions. As an example, in [52], it was investigated whether the open-source speech recognizer Sphinx can be tuned to outperform Google cloud-based speech recognition API in a spoken dialog system task. By …

Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students
S Lokesh, B Kanisha, S Nalini, MR Devi… – Multimedia Tools and …, 2018 – Springer
… Fig. 1 Classification of dialogue systems Multimed Tools Appl Page 4 … This system is found to be a real-world example for POMDP dialogue system … In addition, contextual noise, co- articulation, and accent are found to be the composite issues in speaker recognition system …

Assessment of pitch-adaptive front-end signal processing for children’s speech recognition
R Sinha, S Shahnawazuddin – Computer Speech & Language, 2018 – Elsevier
… Differences in the acoustic and linguistic correlates of speech from adult and child speakers have been observed to affect the performances of speaker recognition tasks as well (Safavi, Najafian, Hanani, Russell, Jancovic, J.Carey, 2012, Safavi, Najafian, Hanani, Russell …

Deep learning bottle-neck features for speaker recognition
E Cumalat Puig – 2018 – upcommons.upc.edu
… 2.1.1 Speaker recognition Speaker recognition is the art of identifying a person using its voice features (in the time and/or frequency domain) … Some applications are chatbots, self-driving cars and facial recognition systems. Page 17. Chapter 2. Context 8 2.2 Database used …

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset
S Malekzadeh, MH Gholizadeh, SN Razavi – arXiv preprint arXiv …, 2018 – arxiv.org
… of quite a number of applications using automatic speech recognition (ASR), including command and control, dictation, dialog systems for people … to some typical LVASR systems [4], it can be found in applications related to language and speaker recognition, music identification …

On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks
JHL Hansen, H Bo?il – Speech Communication, 2018 – Elsevier
… Non-speech sounds—studies have also explored speaker recognition solutions when the audio stream contains non-speech vocalizations such as screams … Human-to-machine: the subject is directing their speech towards a piece of technology (eg, a spoken dialog system via a …

Persian phonemes recognition using PPNet
S Malekzadeh, MH Gholizadeh, SN Razavi – arXiv preprint arXiv …, 2018 – arxiv.org
… of quite a number of applications using automatic speech recognition (ASR), including command and control, dictation, dialog systems for people … to some typical LVASR systems [4], it can be found in applications related to language and speaker recognition, music identification …

Short Utterance Speaker Recognition by Reservoir with Self-Organized Mapping
N Ikeda, Y Sato, H Takahashi – 2018 IEEE Spoken Language …, 2018 – ieeexplore.ieee.org
… A high- accuracy speaker recognition method for short utterances en- ables spoken dialogue systems to recognize a speaker with a wake-up-word, which is generally short, or automatic tran- scription systems to identify which participant is speaking in a meeting by enrollment of …

Effects of Transmitted Speech Bandwidth on Subjective Assessments of Speaker Characteristics
LF Gallardo – 2018 Tenth International Conference on Quality …, 2018 – ieeexplore.ieee.org
… voices [1], gains special relevance with the increasing interest in chatbot-based technology of … for characterizing users are gaining new relevance with the prominence of chatbots … [18] L. Fernández Gallardo, Human and Automatic Speaker Recognition over Telecommunication …

Speaker and language recognition and characterization: Introduction to the CSL special issue
E Lleida, LJ Rodriguez-Fuentes – 2018 – Elsevier
… Speaker recognition (SR) has gained importance in the field of speech science and technology, with new applications beyond forensics … from the community as an auxiliary technology for speech recognition (Gonzalez-Dominguez et al., 2015b), dialogue systems (Lopez-Cozar …

Development Of High-Performance And Large-Scale Vietnamese Automatic Speech Recognition Systems
DQ Truong, PN Phuong, TH Tung, LC Mai – Journal of Computer Science …, 2018 – vjs.ac.vn
… They have a wide range of applications such as controlling robots, call center analytic, voice chatbot … The i-vector was initially introduced for speaker recognition tasks [4], and recently has drawn researcher attention in the field of speech recognition …

Speaker Recognition for Robotic Control via an IoT Device
Z Kozhirbayev, BA Erol, A Sharipbay… – 2018 World …, 2018 – ieeexplore.ieee.org
… 2, no. 3, pp. 456–459, 1994. [10] TJ Hazen, DA Jones, A. Park, LC Kukolich, and DA Reynolds, “Integration of speaker recognition into conversational spoken dialogue systems,” in Eighth European Conference on Speech Communication and Technology, 2003 …

Recognizing emotions from speech using a physical model
N Kitaoka, S Segawa, R Nishimura… – Acoustical Science and …, 2018 – jstage.jst.go.jp
… and use the resulting emotional state determi- nations for spoken dialog system control … It would be useful if automated spoken dialog systems could estimate a wider range of user … for emotion recognition [3]. Spectral information used for speech and speaker recognition such as …

Three-stage speaker verification architecture in emotional talking environments
I Shahin, AB Nassif – International Journal of Speech Technology, 2018 – Springer
… 1 Introduction. “Speaker identification and speaker verification (authentication)” are two main branches of speaker recognition … Speaker recognition comes in two forms in terms of spoken text: “text-dependent” and “text-independent” …

JSpeech: a multi-lingual conversational speech corpus
AJ Choobbasti, ME Gholamian… – 2018 IEEE Spoken …, 2018 – ieeexplore.ieee.org
… ABSTRACT Speech processing, automatic speech and speaker recognition are the major area of interests in the field of computational linguistics. Research and development of computer and hu- man interaction, forensic technologies and dialogue systems have been the …

Generative Adversarial Network
H Lee, Y Tsao – researchgate.net
Page 1. Generative Adversarial Network and its Applications to Signal Processing and Natural Language Processing Hung-yi Lee and Yu Tsao Page 2. Outline Part I: General Introduction of Generative Adversarial Network (GAN) Part II: Applications on Signal Processing …

IIITH-ILSC Speech Database for Indian Language Identification
RK Vuddagiri, K Gurugubelli, P Jain… – Proc. The 6th Intl … – researchgate.net
… When a LID system is used as a front-end switch of a dialog system, the phonotactic constraints of the language can aid the dialog systems to operate … Extraction and representation of prosodic features for language and speaker recognition,” Speech communication …

Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
A Ando, R Asakawa, R Masumura… – Proc. Interspeech …, 2018 – isca-speech.org
… of lexical information in several tasks like spoken lan- guage recognition [11,12] or speaker recognition [13] … Utterances gathered from spoken dialog systems in real … For our experiments, we newly collected Japanese speech samples toward a spoken dialog system for real …

Acoustic feature analysis and discriminative modeling for language identification of closely related South-Asian languages
F Adeeba, S Hussain – Circuits, Systems, and Signal Processing, 2018 – Springer
… It will improve the performance of speech translation [53], multilingual speech recognition [32], user interaction with spoken dialog system [60], and … identification, the dataset can be effectively used in other speech processing applications such as speaker recognition and speech …

Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features Using DNNS and SVMS
A Lykartsis, M Kotti, A Papangelis… – 2018 IEEE Spoken …, 2018 – ieeexplore.ieee.org
… Jerry Wright, Allen Gorin, and Diane Litman, “Learning to predict prob- lematic situations in a spoken dialogue system: experi- ments with … [16] Joseph Tepperman, David Traum, and Shrikanth Narayanan, “Yeah right: Sarcasm recognition for spo- ken dialogue systems,” in Ninth …

Speech and Computer: 20th International Conference, SPECOM 2018, Leipzig, Germany, September 18–22, 2018, Proceedings
A Karpov, O Jokisch, R Potapova – 2018 – books.google.com
… Page 11. X Contents Choosing a Dialogue System’s Modality in Order to Minimize User’s Workload … 274 Arman Kaliyev, Sergey V. Rybin, and Yuri N. Matveev Optimized Active Learning Strategy for Audiovisual Speaker Recognition …

Exploring End-To-End Attention-Based Neural Networks For Native Language Identification
R Ubale, Y Qian, K Evanini – 2018 IEEE Spoken Language …, 2018 – ieeexplore.ieee.org
… spoken language technologies such as automatic speech recognition (ASR), speaker recognition, and voice … can help design a more personalized con- versation with a dialog system targeted at … components for spoken language un- derstanding in spoken dialog systems with a …

Study on the application of cloud computing and speech recognition technology in English teaching
L Wei – Cluster Computing – Springer
… Experimental results show that the efficiency of the man–machine dialogue system with each link based on cloud computing is improved significantly … Jiang, N., Qiu, M., Dai, W.: SROC: a speaker recognition with data decision level fusion method in cloud environment …

Curriculum learning based approach for noise robust language identification using DNN with attention
RK Vuddagiri, HK Vydana, AK Vuppala – Expert Systems with Applications, 2018 – Elsevier
… Growing interest in multilingual dialog systems has developed a lot of scientific attention towards … specific phonotactic constraints could be used to operate the dialog system more robustly … earlier studies for developing LID systems are inspired by the speaker recognition systems …

Improved Language Identification Using Stacked SDC Features and Residual Neural Network}}
RK Vuddagiri, HK Vydana, AK Vuppala – Proc. The 6th Intl. Workshop on … – researchgate.net
… system refers to a module that can tag the input speech with its language identity [1]. LID system has multiple applications in multilingual dialog systems and information … Earliest attempts for training implicit LID systems are in- spired from speaker recognition frameworks …

Rapid Collection of Spontaneous Speech Corpora Using Telephonic Community Forums.
AA Raza, A Athar, S Randhawa, Z Tariq, MB Saleem… – Interspeech, 2018 – zaintq.com
… While such situations could benefit from a two-way spoken interaction (dialog systems), the lack of lin- guistic resources to enable … ldc.upenn.edu/LDC96S46 [34] A. Martin and M. Przybocki, “The nist 1999 speaker recognition evaluationan overview,” Digital signal processing, vol …

Automatic Turn-Level Language Identification for Code-Switched Spanish–English Dialog
V Ramanarayanan, R Pugh, Y Qian… – Proc. of the IWSDS …, 2018 – vikramr.com
… way to proceed if one is only concerned with one or two language pairs, as we scale up code-switched dialog systems to multiple … Initially introduced for speaker recognition [29], i-Vectors have also been shown to be particularly useful features for language recognition (see for …

Azure Cognitive Services
S Machiraju, R Modi – Developing Bots with Microsoft Bots Framework, 2018 – Springer
… Chat bots respond using a predefined set of rules and hence the responses are limited … If you throw a random request at the chat bot, it might not respond with a meaningful message or might just respond with a generic message … Speaker Recognition API …

iSocioBot: A Multimodal Interactive Social Robot
ZH Tan, NB Thomsen, X Duan, E Vlachos… – International Journal of …, 2018 – Springer
… 4.2 Speaker Identification. For speaker recognition we use the i-vector framework, which is currently the state-of-the-art within this field [11] … This is handled by using the ALICE chatbot which is an open source chatbot based on artificial intelligence markup language (AIML) …

A Survey on Gender and Emotion Recognition Using Voice
P Rani, MB Yadav – 2018 – ijrra.net
… In general identification of a speaker gender is important for increasingly natural and personalised dialogue systems … WU Zunjing , CAO Zhigang, published in 2005 describes, the Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speaker recognition …

Improving children’s mismatched asr using structured low-rank feature projection
S Shahnawazuddin, HK Kathania, A Dey… – Speech …, 2018 – Elsevier
… The differences in the acoustic and the linguistic correlates of speech from adult and child speakers have been observed to affect the performance of speaker recognition tasks as well (Safavi, Najafian, Hanani, Russell, Jancovic, J.Carey, 2012, Safavi, Najafian, Hanani, Russell …

Is human-human spoken interaction manageable? The emergence of the concept:’Conversation Intelligence’
V Silber-Varod – Online Journal of Applied Knowledge Management …, 2018 – iiakm.org
… CI research is strongly related to spoken dialogue systems like Siri (Apple.com) or Cortana (Microsoft.com), and other personal … videos, such as: face recognition, speech/pause ratios, assigning the automatic transcription to a speaker (speaker recognition), speaker appearance …

Staircase Network: structural language identification via hierarchical attentive units
TN Trong, V Hautamäki, K Jokinen – arXiv preprint arXiv:1804.11067, 2018 – arxiv.org
Page 1. Staircase Network: structural language identification via hierarchical attentive units Trung Ngo Trong1, Ville Hautamäki1, Kristiina Jokinen2 1School of Computing, University of Eastern Finland, Finland 2AI Research Center, AIST Tokyo Waterfront, Japan …

All Your Alexa Are Belong to Us: A Remote Voice Control Attack against Echo
X Yuan, Y Chen, A Wang, K Chen… – 2018 IEEE Global …, 2018 – ieeexplore.ieee.org
… That is, besides using the voice pattern for authentication, Amazon Echo will act like a chatbot and ask questions on the fly. The questions can be based on user historical profile that was registered previously … Challenge-based speaker recognition for mobile authentication …

The Implementation of Conversation Bot for Smart Home Environment
CF Chih, SJ Hsu, PT Chen, YJ Chen, CY Lu – International Conference on …, 2018 – Springer
… Until now, we use LINE as the chat bot platform to implement our proposed system … 31–35 (2017)Google Scholar. 18. Vacher, M., Lecouteux, B., Romero, JS, Ajili, M., Portet, F., Rossato, S.: Speech and speaker recognition for home automation: preliminary results …

A Corrective Learning Approach for Text-Independent Speaker Verification
Y Wen, T Zhou, R Singh, B Raj – 2018 IEEE International …, 2018 – ieeexplore.ieee.org
… speaker recognition for transparent command ambiguity resolu- tion and continuous access control,” June 6 2000, US Patent 6,073,101. 1 [3] Felix Burkhardt, Richard Huber, and Anton Batliner, “Application of speaker classification in human machine dialog systems,” Speaker …

Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
T Fukuda, O Ichikawa, M Nishimura – Speech Communication, 2018 – Elsevier
… Typical examples of the applications include a voice search used in mobile phones (Sainath et al., 2017), a voice control for car navigation systems (Wang et al., 2008), and a spoken dialog system for use with robots (Williams and Young, 2007) …

Discriminative keyword spotting using triphones information and N-best search
S Tabibian, A Akbari, B Nasersharif – Information Sciences, 2018 – Elsevier
… Five major applications of KWS are keyword monitoring, audio document indexing, command control devices, dialogue systems and routing multimedia files and streams according to their content. We discuss some of these applications with more detail in the following …

Human Language Technologies for Under-Resourced African Languages: Design, Challenges, and Prospects
ME Ekpenyong – 2018 – books.google.com
… Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use …

Feature selection and nuisance attribute projection for speech emotion recognition
M Man-Wai – eie.polyu.edu.hk
… A typical example of such scenario is spoken dialog systems for customer services. In recent years, much progress in speech emotion recognition has been made … Unlike speaker recognition, speaker variability is one of the nuisance attributes that we want to remove …

A Comparison Study of Face, Gait and Speech Features for Age Estimation
P Punyani, R Gupta, A Kumar – Advances in Electronics, Communication …, 2018 – Springer
… Natural interaction with the dialogue systems, target advertising based on age, forensic studies, user characterization and pairing of caller and agent in … Cieri, C., Corson, L., Graff, D., Walker, K.: Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 …

Recent trends in deep learning based natural language processing
T Young, D Hazarika, S Poria… – ieee Computational …, 2018 – ieeexplore.ieee.org
… processed in less than a second [1]. NLP enables computers to perform a wide range of natural language related tasks at all levels, ranging from parsing and part- of-speech (POS) tagging, to machine translation and dialogue systems …

Deep Learning with Applications Using Python
NK Manaswi, NK Manaswi, S John – 2018 – Springer
… 146 Designs and Functions of Chatbots …..146 Steps for Building a Chatbot …..147 Preprocessing Text and Messages …..148 …

Multi-task Learning in Prediction and Correction for Low Resource Speech Recognition
J Tao – Man-Machine Speech Communication: 14th National …, 2018 – books.google.com
… for several language processing predic- tions;[15] improves intent classification in goal oriented human-machine spoken dialog systems which is … of Interspeech (2014) Rozi, A., Wang, D., Zhang, Z.: An open/free database and Benchmark for Uyghur speaker recognition …

Introducing assistive analytical agent for exploratory and decision making in analytical multi-user sessions
Y F?nd?k – 2018 – research.sabanciuniv.edu
Page 1. INTRODUCING ASSISTIVE ANALYTICAL AGENT FOR EXPLORATORY AND DECISION MAKING IN ANALYTICAL MULTI-USER SESSIONS by YASIN FINDIK Submitted to the Graduate School of Engineering and Natural Sciences in partial fulfillment of …

Style transfer in text: Exploration and evaluation
Z Fu, X Tan, N Peng, D Zhao, R Yan – Thirty-Second AAAI Conference on …, 2018 – aaai.org
… RUBER (Tao et al. 2017) was pro- posed to evaluate dialog system, it divides evaluation into referenced and unreferenced part … (Zhou et al. 2017) controls emotion of conversation, it also uses a classifier to evaluate chatbot gen- erated emotional response …

Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition
SG Koolagudi, YVS Murthy, SP Bhaskar – International Journal of Speech …, 2018 – Springer
In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study…

Shifting Sands in Second Language Pronunciation Teaching and Assessment Research and Practice
T Isaacs – Language Assessment Quarterly, 2018 – Taylor & Francis
… For example, it could be expedient to examine in developing or validating a test consisting of dialogue systems with avatars (see Mitchell, Evanini, & Zechner, 2014, for an example of a spoken dialogue system for L2 learners) …

A Brazilian Speech Database
MAD Paulino, ASB Junior, AR Svaigen… – 2018 IEEE 30th …, 2018 – ieeexplore.ieee.org
… Besides presenting the ELSDSR, the authors performed speaker recognition experiments on the database using Mel Frequency Cepstral … in Mandarin real-traffic dataset, which contains 17,408 utterances collected from a Microsoft spoken dialogue system, totalizing 16 hours …

An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods
I Odriozola, I Hernaez, E Navas – Expert Systems with Applications, 2018 – Elsevier
… Mporas, Kocsis, Ganchev, and Fakotakis (2010) the authors use Automatic Speech Recognition (ASR) technology with a VAD to develop a dialogue system in a … Tirumala, Shahamiri, Garhwal, and Wang (2017) identify VAD as one of the research areas for speaker recognition …

Audio event recognition in the smart home
S Krstulovi? – Computational Analysis of Sound Scenes and Events, 2018 – Springer
… public is the level of maturity and usability achieved by voice interfaces, ie, automatic speech recognition, speech synthesis, and artificial dialog systems … AER research, just like it has been a topic of interest for decades in the context of speech and speaker recognition research …

Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition
A Becerra, JI de la Rosa, E González… – Multimedia Tools and …, 2018 – Springer
Page 1. Multimed Tools Appl (2018) 77:27231–27267 https://doi.org/10.1007/s11042- 018-5917-5 Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition Aldonso Becerra1 …

An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods
I Odriozola Sustaeta, H Rioja, I Concepción… – 2018 – addi.ehu.es
… In Mporas et al. (2010) the authors use Automatic Speech Recognition (ASR) technology with a VAD to develop a dialogue system in a motorcycle environment. Principi et al … Tirumala 25 et al. (2017) identify VAD as one of the research areas for speaker recognition …

Pitch Range Estimation with Multi features and MTL-DNN Model
Q Zhang, C Cao, T Li, Y Xie… – 2018 14th IEEE …, 2018 – ieeexplore.ieee.org
… classification could be improved when taking the pitch range into consideration in dialogue systems [5]. The current methods of estimating the speaker’s pitch range were … [19] Tang Z, Li L, Wang D. Multi-task Recurrent Model for Speech and Speaker Recognition[C] //Signal and …

Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition
N Lubis, D Lestari, S Sakti, A Purwarianti… – … on Information and …, 2018 – search.ieice.org
… of complex and emotionally advanced system, in particu- lar spoken dialogue systems, eg Sensitive Artificial Lis- tener [1], personable in-car assistant [2], and an embodied conversational companion [3]. This increasing interest in the topic is partly owing to the challenges held …

NeuroSpeech: An open-source software for Parkinson’s speech analysis
JR Orozco-Arroyave, JC Vásquez-Correa… – Digital Signal …, 2018 – Elsevier
Skip to main content …

Scalable Hierarchical Language Identification System.
S Irtza – 2018 – unsworks.unsw.edu.au
Page 1. Scalable Hierarchical Language Identification System A thesis submitted for the degree of Doctor of Philosophy By Saad Irtza Supervisor: Prof. Eliathamby Ambikairajah Joint Supervisor: Dr. Vidhyasaharan Sethu School of Electrical Engineering and Technology …

Investigating Utterance Level Representations for Detecting Intent from Acoustics
SK Rallabandi, B Karki, C Viegas, E Nyberg… – Proc. Interspeech …, 2018 – isca-speech.org
… Paralinguistic information also has applications in other domains of speech processing such as dialog systems, speech synthesis, voice conversion, assistance … As the original data did not have speakers tagged per utterance, we have tried to do speaker recognition using length …

Extraction of Prosody for Automatic Speaker, Language, Emotion and Speech Recognition
L Mary – 2018 – books.google.com
… This helped in the study of long- term features for speaker recognition … Applications such as spoken dialog systems, database search and retrieve systems, automatic call routing, and language translation need to address the possible presence of multiple languages in the input …

Daily activity recognition with large-scaled real-life recording datasets based on deep neural network using multi-modal signals
T Hayashi, M Nishida, N Kitaoka, T Toda… – IEICE Transactions on …, 2018 – jstage.jst.go.jp
Page 1. IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018 199 PAPER Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals …

Is ATIS too shallow to go deeper for benchmarking Spoken Language Understanding models?
F Béchet, C Raymond – InterSpeech, 2018 – hal.inria.fr
… If large benchmark datasets can be found for tasks such as im- age and text classification, speech and speaker recognition, this is not the … SLU has been mostly studied in the context of human-machine interaction, such as Spoken Dialog Systems (SDS) for information access or …

Emotion Identification from raw speech signals using DNNs
M Sarma, P Ghahremani, D Povey, NK Goel… – Proc. Interspeech …, 2018 – danielpovey.com
… In the evolving se- tups of intelligent commercial dialogue systems and smart call centers, emotion information obtained from speech can be … D. Garcia-Romero, G. Sell, D. Povey and S. Khu- danpur, “X-VECTORS: Robust DNN Embeddings for Speaker Recognition,” in ICASSP …

Rapid development of new TTS voices by neural network adaptation
T Deli?, S Suzi?, M Se?ujski… – 2018 17th International …, 2018 – ieeexplore.ieee.org
… The research was conducted within the project “Development of Dialogue Systems for Serbian and Other South Slavic Languages” (TR32035), financed by Ministry of education, science and technological development of Republic of Serbia, EUREKA project DANSPLAT (E …

A critical review and analysis on techniques of speech recognition: The road ahead
AV Haridas, R Marimuthu… – International Journal of …, 2018 – content.iospress.com
… news transcript, from reading style voice dictation to impulsive dialogue systems, and so … After that a speaker recognition approach using this enhanced algorithm to coach Support … three novel language modeling techniques that employ semantic study for spoken dialog systems …

Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
R Arumugam, R Shanmugamani – 2018 – books.google.com
… Chatbot’s are becoming an integrated part of any website; while virtual assistants are gaining … deep learning solutions for automatic accounting at SAP, Singapore, and conversational chatbots at Evie … He is also a research assistant with the dialog systems group at Laboratory of …

Improving the User Experience of Electronic University Enrollment
L Galko, J Porubän, J Senko – 2018 16th International …, 2018 – ieeexplore.ieee.org
… A. Solution For testing the usability of chatbots for university appli- cation submission systems we … Available: https://www.topbots.com/4-critical-steps-to-maximize- chatbot-retention-engagement/ [18 … and J. Juhár, “A study of acoustic features for emotional speaker recognition in i …

Deep Learning Approaches to Feature Extraction, Modelling and Compensation for Short Duration Language Identification
S Fernando – 2018 – researchgate.net
Page 1. Deep Learning Approaches to Feature Extraction, Modelling and Compensation for Short Duration Language Identification Sarith Fernando A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy …

Speaker Verification Using Adapted Bounded Gaussian Mixture Model
M Azam, N Bouguila – … on Information Reuse and Integration (IRI …, 2018 – ieeexplore.ieee.org
… A speaker recognition system performs two tasks: speaker identification and verification … is to validate and confirm the claim of a speaker about its identity [1], [2]. Speaker verification has been used in many applications such as human-machine dialog systems, medical, forensics …

Perceptual Features Based Rapid and Robust Language Identification System for Various Indian Classical Languages
A Revathi, C Jeyalakshmi… – Computational Vision and …, 2018 – Springer
… language from the set of speech utterances. Some of the notable applications of LID include global communications, call routing systems, multilingual dialog systems and multilingual translation systems etc. LID is also a topic of …

Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors
F Abtahi – 2018 – academicworks.cuny.edu
… We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection … Page 21. 6 Chapter 2 Multimodal Speaker Recognition 2.1 Introduction …

PATHOSnet: parallel, audio-textual, hybrid organization for sentiment network
J ORIGGI – 2018 – politesi.polimi.it
Page 1. POLITECNICO Scuola di Ingegneria Ind Corso di Laurea Magistrale PATHOSnet: Parallel, Audio Advisor: Prof. Licia SBATTELLA Co-advisor: Ing. Roberto TEDESCO Academic POLITECNICO DI MILANO Scuola di Ingegneria Industriale e dell’Informazione …

Detecting early signs of dementia in conversation
B Mirheidari – 2018 – etheses.whiterose.ac.uk
… Page 13. SDS Spoken Dialogue System Sez Seizure SGD Stochastic Gradient Descent … SPLICE Stereo Piece-wise Linear Compensation for Environment SRE Speaker Recognition Evaluation SVM Support Vector Machine TCD Transcranial Doppler …

Face-Voice Matching using Cross-modal Embeddings
S Horiguchi, N Kanda, K Nagamatsu – 2018 ACM Multimedia …, 2018 – dl.acm.org
… ACM, New York, NY, USA, page 9 pages. https://doi.org/10.1145/3240508.3240601 1 INTRODUCTION Speaker diarization, which identifies “who spoke when,” is an es- sential process in dialogue systems or robotics services …

Deep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform
M Salvaris, D Dean, WH Tok – 2018 – books.google.com
… Take Unilever, for example: They have built a collection of chatbots with a master botto help their employees interact with human resources services and all services inside the enterprise. Jabil uses AI for quality control in the circuit board manufacturing process …

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System
H Liu, X Wang, Y Wei, W Shao, J Liono… – Proceedings of the 15th …, 2018 – dl.acm.org
… In speaker recognition, we use MFCC or MFEC (Mel- frequency Energy Coefficient) as the original data … is known as seq2seq [23] and is widely used in the scenarios where there are input sequences and output sequences, such as machine translations, and chat bots …

Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods
Z Hamadache, H Sayoud – International Journal of Knowledge and …, 2018 – igi-global.com
… Mixture Model (GMM) Clustering Approach According to our knowledge, we have found that the GMM approach (Narayanaswamy et al., 2005;Reynoldsetal.,2009;Kinnunenetal.,2011; Liuetal.,2012)wascommonlysuggestedin theliteratureforSpeakerRecognition(SR)rather …

Interrupting Drivers for Interactions: Predicting Opportune Moments for In-vehicle Proactive Auditory-verbal Tasks
A Kim, W Choi, J Park, K Kim, U Lee – … of the ACM on Interactive, Mobile …, 2018 – dl.acm.org
Page 1. 175 Interrupting Drivers for Interactions: Predicting Opportune Moments for In-vehicle Proactive Auditory-verbal Tasks AUK KIM, KAIST, South Korea WOOHYEOK CHOI, KAIST, South Korea JUNGMI PARK, Samsung …

Dissertations in Forestry and Natural Sciences
A SHOLOKHOV – epublications.uef.fi
… In contrast to ASR, automatic speaker recognition (SR) is concerned with the identity of a speaker, regardless of what was said … and physical access control (eg controlling ac- cess to a bank account or a physical space) and personalization (eg personalized dialogue system) …

Synthetic speech detection using fundamental frequency variation and spectral features
M Pal, D Paul, G Saha – Computer Speech & Language, 2018 – Elsevier
… train any statistical model. Henceforth, we compress the spectrum by using a 7 filter filterbank, which is similar to the technique of passing the FFT power spectrum through a mel filterbank in speaker recognition. The filters in …

Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
TG Yadava, HS Jayanna – Journal of Intelligent Systems – degruyter.com
AbstractIn this paper, the improvements in the recently developed end to end spoken query system to access the agricultural commodity prices and weather information in Kannada language/dialects is demonstrated. The spoken query system consists of interactive voice response …

Conversational Speech Understanding in highly Naturalistic Audio Streams
L Kaushik – 2018 – utd-ir.tdl.org
… other context based conditions. Conversational Speech Understanding comprises three basic modules namely; Speaker recognition/verification attributing to ”who is speaking”, Contin … traditional speech processing technology such as robust speech and speaker recognition. In …

Design of a Phonetically Balanced Code-Mixed Hindi-English Read Speech Corpus for Automatic Speech Recognition
A Pandey – 2018 – web2py.iiit.ac.in
… The corpus is also being put to use in speaker-recognition experiments and in computing the acoustic-phonetic correlation of the … et al [49] design a set of intrasentential code-mixed prompts through HALEF (Help Assistant – LanguageEnabled and Free) dialog system 1 in a …

Early Turn-Taking Prediction for Human Robot Collaboration
T Zhou – 2018 – search.proquest.com
Page 1. EARLY TURN-TAKING PREDICTION FOR HUMAN ROBOT COLLABORATION by Tian Zhou A Dissertation Submitted to the Faculty of Purdue University In Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy School of Industrial Engineering …

Etude de la direction du regard dans le cadre d’interactions so-ciales incluant un robot Gaze Direction in the context of So-cial Human-Robot Interaction
PA Bartoli, M Salzmann, S Ba, H Hung, R Horaud… – hal.inria.fr
Page 1. HAL Id: tel-01936821 https://hal.inria.fr/tel-01936821 Submitted on 27 Nov 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …

Gaze direction in the context of social human-robot interaction
B Massé – 2018 – tel.archives-ouvertes.fr
Page 1. HAL Id: tel-01936821 https://tel.archives-ouvertes.fr/tel-01936821v3 Submitted on 11 Mar 2019 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …

Advance compression and watermarking technique for speech signals
R Thanki, K Borisagar, S Borra – 2018 – Springer
… Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use …

The Importance of Context When Recommending TV Content: Dataset and Algorithms
MS Kristoffersen, SE Shepstone, ZH Tan – arXiv preprint arXiv:1808.00337, 2018 – arxiv.org
… That is, they used a number of sensors to automatically extract contextual settings of TV viewing events, eg Bluetooth trackers to identify present users and their activity level, together with chatbot sessions for obtaining self-reported information of eg social context …

Speech-Based Emotion Recognition: Linguistic and Saliency-Based Systems
KW Gamage – 2018 – unsworks.unsw.edu.au
Page 1. Speech-Based Emotion Recognition: Linguistic and Saliency-Based Systems Kalani Wataraka Gamage A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy School of Electrical Engineering and Telecommunications …

A multiobjective learning and ensembling approach to high-performance speech enhancement with compact neural network architectures
Q Wang, J Du, LR Dai, CH Lee – IEEE/ACM Transactions on Audio …, 2018 – dl.acm.org
… MFCC features are popular in both ASR [42] and speaker recognition [43]. High similarity exists between the spacing of Mel-filtering and the human perception scale [39]. When compared to LPS features, MFCC features are concentrated in a low-dimensional …

Deep Learning with Azure
M Salvaris, D Dean, WH Tok – Springer
… Take Unilever, for example: They have built a collection of chat bots with a master bot to help their employees interact with human resources services and all services inside the enterprise. Jabil uses AI for quality control in the circuit board manufacturing process …

Exploiting temporal context in speech technologies using LSTM recurrent neural networks
R Zazo Candil – 2018 – repositorio.uam.es
… sentiment analysis in text [Dos Santos and Gatti, 2014], speaker recognition [Richardson et al., 2015], document retrieval [Le and Mikolov, 2014 … As an example, systems such as Google Assistant, Apple’s Siri or Amazon’s Alexa that are controlled with dialogue systems are now …

Privacy and Cyber Security on the Books and on the Ground
MC Dähn, I Pernice, J Pohle, Z Goldman, P Nemitz… – 2018 – papers.ssrn.com
Page 1. Electronic copy available at: https://ssrn.com/abstract=3250354 ALEXANDER VON HUMBOLDT INSTITUTE FOR INTERNET AND SOCIETY BERLIN, GERMANY PRIVACY AND CYBER SECURITY ON THE BOOKS AND ON THE GROUND …

Direction of arrival estimation and localization of multi-speech sources
N Dey, AS Ashour – 2018 – Springer
… Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use …