Kaldi ASR - Meta-Guide.com

Notes:

Kaldi is an open-source speech recognition toolkit that is widely used by researchers and developers in the field of speech recognition and natural language processing. It is written in C++ and is freely available under the Apache License v2.0. Kaldi provides a range of tools and resources for building and evaluating speech recognition systems, including algorithms for feature extraction, acoustic modeling, and language modeling. It is widely respected in the field for its performance and versatility, and is used in a variety of applications, including transcription, translation, and voice command systems.

Daniel Povey is a researcher and developer in the field of speech recognition and natural language processing. He is best known as the main developer and maintainer of Kaldi, an open-source speech recognition toolkit written in C++ that is widely used by researchers and developers in the field. Povey has published numerous papers on speech recognition and natural language processing, and has worked on a number of research projects related to these topics. Prior to his work on Kaldi, Povey was a researcher at Microsoft Research and has also worked at other research institutions and organizations.

Resources:

Wikipedia:

Daniel Povey
Kaldi (software)

The Munich feature enhancement approach to the 2nd CHiME challenge using BLSTM recurrent neural networks F Weninger, J Geiger… – … of the 2nd …, 2013 – mmk.e-technik.tu-muenchen.de … Finally, we perform experiments with the Kaldi ASR system. … re-training 47.62 56.86 50.25 45.08 39.25 34.56 31.81 42.97 Kaldi ASR system (reverberated model, maximum likelihood training) Baseline 71.82 85.97 80.29 74.22 66.00 56.51 46.39 68.23 … Cited by 9 Related articles All 7 versions

Black box optimization for automatic speech recognition S Watanabe, J Le Roux – Acoustics, Speech and Signal …, 2014 – ieeexplore.ieee.org … The ASR experiments were performed by using the Kaldi ASR toolkit [23], and followed the standard recipes in the toolkit for RM-ML, RM-NN, and WSJ-DT tasks. The human expert results in our experiments were basically obtained using the parameters as tuned in the recipes. … Cited by 2

Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices O Plátek, F Jurc?cek – 15th Annual Meeting of the Special Interest Group …, 2014 – aclweb.org Abstract This paper presents an extension of the Kaldi automatic speech recognition toolkit to support on-line recognition. The resulting recogniser supports acoustic models trained using state-of-theart acoustic modelling techniques. As the recogniser produces word …

Alex: A Statistical Dialogue Systems Framework F Jur?í?ek, O Dušek, O Plátek, L Žilka – Text, Speech and Dialogue, 2014 – Springer … While the system was deployed, we have been gradually improving our ASR and SLU components. A performance comparison of Google ASR with Kaldi ASR trained on our data is shown in Figure 4 (left). One can see that the Kaldi ASR improves as …

Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users O Dušek, O Plátek, L Žilka… – 15th Annual Meeting of …, 2014 – anthology.aclweb.org … cz/ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 25 30 35 40 45 50 Google ASR Kaldi ASR Training set portion W o rd e rro r ra te (% ) Figure 1: ASR word error rate depending on the size of in-domain language model training data The full training set amounts to 9495 utterances …

An investigation of single-pass ASR system combination for spoken language understanding F Bougares, M Rouvier, N Camelin, P Deléglise… – Statistical Language and …, 2013 – Springer … Before concluding, section 4 presents the experiments with the comparison of results obtained by the three state-of-the-art ASR systems: CMU Sphinx project [21], RWTH ASR toolkit [22], Kaldi ASR system [20] and their combi- nation. 2 Spoken Language Understanding … Related articles All 4 versions

A Reranking Approach For Recognition And Classification Of Speech Input In Conversational Dialogue Systems FMKAR Artstein, MVSK Sagae – ict.usc.edu … To generate the custom language model for the AT&T and OtoSense-Kaldi ASR we used the following procedure: for the training set, we divided it into 10 folds and for each fold, we used the manual transcriptions found in the other 9 folds to generate the text file used to train the … Related articles All 5 versions

Integration of an On-line Kaldi Speech Recogniser to the Alex Dialogue Systems Framework O Plátek, F Jur?í?ek – Text, Speech and Dialogue, 2014 – Springer … Page 8. 610 O. Plátek and F. Jurcícek 10. The Kaldi ASR toolkit (2014), http://sourceforge.net/ projects/kaldi 11. The Alex Dialogue Systems Framework (2014), https://github.com/UFAL-DSG/ alex 12. The OnlineLatgenRecogniser (2014), https://github.com/UFAL-DSG/pykaldi 13. …

Improving Named Entity Recognition with Prosodic Features D Katerenchuk, A Rosenberg – … Annual Conference of …, 2014 – mazsola.iit.uni-miskolc.hu … 3.1. Automatic Speech Recognition Systems For the experiments in this paper, we use the KALDI ASR Toolkit developed by [11]. Using standard training recipes for the ASR, we build two different acoustic models trained on WSJ data [12]. …

A Pitch Extraction Algorithm Tuned For Automatic Speech Recognition P Ghahremani, B BabaAli, D Povey, K Riedhammer… – danielpovey.com … improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR toolkit), is a highly modified version of the getf0 (RAPT) algorithm. Unlike the original getf0 … Cited by 14 Related articles All 2 versions

System for Automated Speech and Language Analysis (SALSA) K Marek-Spartz, B Knoll, R Bill… – … Conference of the …, 2014 – mazsola.iit.uni-miskolc.hu … For each set of MFCC vectors rep- resenting the speech input frames, the KALDI ASR decoder was used to find the highest likelihood path through the lattice of hypotheses constructed based on the language and acoustic models described above. …

Speaker adaptation of DNN-based ASR with i-vectors: Does it actually adapt models to speakers? M Rouvier, B Favre – Fifteenth Annual Conference of the …, 2014 – mazsola.iit.uni-miskolc.hu … GLR). Then a hierarchical agglomer- ative clustering is used to group the segments belonging to the same speakers using the BIC distance. 3008 Page 3. 4.3. ASR In our experiments we used the Kaldi ASR toolkit [15]. The speech …

Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition TP Tan, L Besacier… – … Conference on Asian …, 2014 – hal.univ-grenoble-alpes.fr … A snippet example of a decision tree used in Kaldi ASR system [9] is shown in Figure 3. At each node of a decision tree, question is asked about the context of a triphone (left phone, right phone, center phone and pdf id). … Figure 3. Decision tree in Kaldi ASR system. …

Using Malay Resources To Bootstrap Asr For A Very Under-Resourced Language: Iban SS Juan, L Besacier, S Rossato – SLTU 2014, 2014 – hal.archives-ouvertes.fr … 13,960 76 4. BASELINE SPEECH RECOGNIZERS We experimented Kaldi ASR system [14] for Iban, an open source toolkit based on FST. Acoustic models were trained using three lexicons and the training transcript. Each …

LibriSpeech: An ASR Corpus Based On Public Domain Audio Books V Panayotov, G Chen, D Povey, S Khudanpur – danielpovey.com … The corpus is freely available4 under the very permissive CC BY 4.0 li- cense [3] and there are example scripts in the open source Kaldi ASR toolkit [4], that demonstrate how high quality acoustic models can be trained on this data. …

Exploiting un-transcribed foreign data for speech recognition in well-resourced languages D Imseng, B Potard, P Motlicek… – … , Speech and Signal …, 2014 – infoscience.epfl.ch … 4. EXPERIMENTAL SETUP We used the Kaldi ASR toolkit [14] for our experiments. An overview over all evaluated systems is given in Table 2. 4.1. Baseline Our baseline ASR was trained using MP-FR-core. As an acoustic model, we trained a DNN with 3 hidden layers. … Cited by 1 Related articles All 2 versions

Extracting Deep Neural Network Bottleneck Features Using Low-Rank Matrix Factorization Y Zhang, E Chuangsuwanich, J Glass – people.csail.mit.edu … 3.2. Baseline HMM systems Our baseline HMM systems were trained using the Kaldi ASR toolkit [15]. We used 13-dimensional PLP features concatenated with F0 estimates and the probability of voic- ing [16]. Conversation … Cited by 3 Related articles

Semi-Supervised G2P Bootstrapping and its Application to ASR for a very Under-Resourced Language: Iban SS Juan, L Besacier, S Rossato – mica.edu.vn … 4. BASELINE SPEECH RECOGNIZERS We experimented Kaldi ASR system [20] for Iban, an open source toolkit based on Finite States Transducers. Acoustic models were trained using three lexicons and the training transcript. …

Comparing a high and low-level deep neural network implementation for automatic speech recognition J Ray, B Thompson, W Shen – Proceedings of the 1st …, 2014 – conferences.computer.org … The spe- cific ASR task considered in this work is phone recognition on the TIMIT speech corpus [6]. We compare our Theano DNN implementation to a hand- optimized C++/CUDA DNN implementation [18] from the popular Kaldi ASR toolkit [15]. …

Multilingual Deep Neural Network based Acoustic Modeling For Rapid Language Adaptation NT Vu, D Imseng, D Povey, P Motlicek… – … , Speech and Signal …, 2014 – infoscience.epfl.ch … 2. DNN TRAINING This section describes some key features of the Kaldi DNN training recipe [15] – part of the Kaldi ASR toolkit [16] – which we used in our study. Currently Kaldi contains two parallel implementations for DNN training. Both recipes Page 2. … Cited by 2 Related articles All 3 versions

Automatic sentiment extraction from YouTube videos L Kaushik, A Sangwan… – … Speech Recognition and …, 2013 – ieeexplore.ieee.org … On the other hand, the With Noun system performs the worst. We suspect that having more noun features makes the system more domain dependent. This hurts the generality of the sentiment model which in turn results in poorer performance. KALDI ASR 242 Page 5. … Related articles All 2 versions

Should deep neural nets have ears? The role of auditory features in deep learning approaches AMC Martinez, N Moritz, BT Meyer – Fifteenth Annual Conference of the …, 2014 – 193.6.4.39 … Owing to the capability of layer- wise pre-training using Restricted Boltzmann Machines (RBM) [28] and optimization via stochastic gradient descent (SGD) on the graphics processing unit (GPU), we opted for this particular Kaldi ASR toolkit DNN implementation [29] which can …

Graph-based Re-ranking using Acoustic Feature Similarity between Search Results for Spoken Term Detection on Low-resource Languages H Lee, Y Zhang, E Chuangsuwanich… – … Annual Conference of …, 2014 – blog.narotama.ac.id … 3.1. Data and Recognition Systems The audio corpora that we used in our research were from the lim- ited language pack condition of the IARPA Babel program. The recognizers were trained using the Kaldi ASR toolkit [17]. …

Improving deep neural network acoustic models using generalized maxout networks X Zhang, J Trmal, D Povey, S Khudanpur – Proc. ICASSP, 2014 – danielpovey.com … 2. OUR DNN RECIPE In this section we explain key features of our baseline DNN training recipe. This recipe is part of the Kaldi ASR toolkit [14]. In order to avoid confusion we should explain that Kaldi currently contains two parallel implementations for DNN training. … Cited by 26 Related articles All 2 versions

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems. K Audhkhasi, AM Zavou, PG Georgiou… – IEEE/ACM Transactions …, 2014 – sail.usc.edu … data set. Section IV describes our experimental setup using the Kaldi ASR toolkit [29] and ASR confidence estimation using a variety of lattice-based and prosodic features within a conditional random field (CRF) [30] model. We … Cited by 4 Related articles

Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training A Narayanan, DL Wang – cse.ohio-state.edu Page 1. Technical Report OSU-CISRC-9/14-TR15 Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210-1277 Ftpsite: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech …