Julius LVCSR 2016


Notes:

Large-vocabulary continuous speech recognition (LVCSR, more commonly known as speech-to-text, full transcription or ASR – automatic speech recognition) uses a set of words (bi-grams, tri-grams etc.) as the basic unit.

Resources:

Wikipedia:

See also:

100 Best CMUSphinx VideosHTK (Hidden Markov Model Toolkit) & Dialog Systems | Kaldi ASR


An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation.
P Viszlay, J Stas, T Koctúr, M Lojka, J Juhár – LREC, 2016 – researchgate.net
… Our other serious interest is focused on the DNN-based LVCSR for Slovak using Kaldi (Povey et al., 2011). There- fore, we would like to replace the current speech recogni- tion engine Julius by the WFST-based Kaldi engine. …

Implementation of Word Level Speech Recognition System for Punjabi Language
S Mittal, R Kaur – International Journal of Computer …, 2016 – pdfs.semanticscholar.org
… HResults provides a speaker-by-speaker breakdowns, transcriptions aligned timely and confusion matrices for the sake of global performance measures. 3. JULIUS Julius is a large vocabulary continuous speech recognition (LVCSR) engine. …

Automatic Generation of Proper Noun Entries in a Speech Recognizer for Local Information Recognition
K Shiga, T Nose, A Ito, R Masumura, H Masataki – 2016 – researchgate.net
… 1. Introduction Recently, large vocabulary continuous speech recognition (LVCSR) technology has been developed rapidly, and LVCSR-based technology is used on real fields such as information retrieval and call center. … We used Julius ver.4.3.12 as a decoder. …

Unsupervised speech transcription and alignment based on two complementary ASR systems
T Koctúr, P Viszlay, J Staš, M Lojka… – … 2016 26th International …, 2016 – ieeexplore.ieee.org
… Our ASR uses LVCSR recognition engine Julius based on triphone HMMs. Input speech is parametrized with standard MFCC feature vectors compound of 12 coefficient MFCC, log energy and delta and delta-delta coefficients. …

Evaluation of DNN-based Phoneme Estimation Approach on the NTCIR-12 SpokenQuery&Doc-2 SQ-STD Subtask.
N Sawada, H Nishizaki – NTCIR, 2016 – pdfs.semanticscholar.org
… 5.1.1 Speech recognition As shown in Figure 2 and Figure 5, the SDPWS speech data is recognized by the 10 ASRs. Julius ver. 4.1.3 [10], an open source decoder for LVCSR, is used in all the systems. We prepared two types …

Unsupervised acoustic corpora building based on variable confidence measure thresholding
T Koctúr, J Staš, J Juhár – ELMAR, 2016 International …, 2016 – ieeexplore.ieee.org
… uses LVCSR recognition engine Julius [9] based on triphone HMMs. 5 states left to right HMMs with non-emitting states on each end are used. As feature vectors, 12 MFCC coefficient with log energy and their first and second derivations were used. …

Context-dependent point process models for keyword search and detection-based ASR
C Liu, A Jansen, S Khudanpur – Acoustics, Speech and Signal …, 2016 – ieeexplore.ieee.org
… Lattice generation produces further KWS improvements by incor- porating language models and better score normalization. Further- more, lattices support LVCSR decoding, which gives reasonable performance for a first attempt on a difficult task … [8] Steve J Young, Julian J Odell …

Implementation of Statistical Speech to Text Recognition System for Punjabi Language
S Mittal, RG Kaur – 2016 – tudr.thapar.edu
… symbol that is chosen from the particular fixed vocabulary. 1.3 Julius Julius is a large vocabulary continuous speech recognition (LVCSR) engine. It is a high performance decoder software which is used by speech related researchers and developers [13]. It incorporates …

Language Models with RNNs for Rescoring Hypotheses of Russian ASR
I Kipyatkova, A Karpov – International Symposium on Neural Networks, 2016 – Springer
… Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. … 515–520 (2009). 23. Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. …

Policy compression for aircraft collision avoidance systems
KD Julian, J Lopez, JS Brush, MP Owen… – … (DASC), 2016 IEEE …, 2016 – ieeexplore.ieee.org
Page 1. Policy Compression for Aircraft Collision Avoidance Systems Kyle D. Julian ? , Jessica Lopez † , Jeffrey S. Brush † , Michael P. Owen ‡ and Mykel J. Kochenderfer ? ? Department of Aeronautics and Astronautics, Stanford …

Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals
T Taniguchi, R Nakashima, H Liu… – Advanced …, 2016 – Taylor & Francis
Page 1. ADVANCED ROBOTICS, 2016 VOL. 30, NOS. 11–12, 770–783 http://dx.doi.org/ 10.1080/01691864.2016.1159981 FULL PAPER Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals …

Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
N Sawada, H Nishizaki – IEICE TRANSACTIONS on Information …, 2016 – search.ieice.org
… As shown in Figs.2 and 4, the speech data were rec- ognized by the ten ASRs. Julius ver. 4.1.3 [30], an open source decoder for LVCSR, was used in all the systems. We prepared two types of AM and five types of LM for con- structing the PTN. …

Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
R Fernandez, W Minker, G Carenini… – Proceedings of the 17th …, 2016 – aclweb.org
… LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications Alexei V. Ivanov, Patrick L. Lange and David … Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies Julian Hough and David Schlangen . . …

Cartesian abstraction can yield ‘cognitive maps’
A L?rincz – Procedia Computer Science, 2016 – Elsevier
… Improving deep neural networks for LVCSR using rectified linear units and dropout. In Acoust., Speech Sign. Proc. (ICASSP), 2013, pages 8609-8613. … Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. [29]; E. Matthew, J. Larkum, Julius Zhu, Bert Sakmann; …

Speech Recognition System for Medical Domain
T Dodiya, S Jain – Citeseer
… D. Julius Julius is high-performance open source speech recognition software. It uses major speech recognition techniques, and performs a large vocabulary continuous speech recognition (LVCSR) task effectively in real-time processing [11]. …

Data selection from multiple ASR systems’ hypotheses for unsupervised acoustic model training
S Li, Y Akita, T Kawahara – Acoustics, Speech and Signal …, 2016 – ieeexplore.ieee.org
… They are numeric features output by the Julius decoder. … 5878 Page 5. References [1] K. Yu, M. Gales, L. Wang and P. Woodland, Unsupervised training and directed manual transcription for LVCSR. Speech Communication, Vol52(7), pp.652-663, 2010. …

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems’ Outputs for Spoken Term Detection.
N Sawada, H Nishizaki – INTERSPEECH, 2016 – researchgate.net
… 4.1.2. ASR systems As shown in Figure 3, the speech data were recognized by the 10 ASRs. Julius ver. 4.1.3 [18], an open source decoder for … LVCSR, was used in all systems. We prepared two types of AMs and five types of LMs. …

Stem-affix based Uyghur morphological analyzer
M Ablimit, T Kawahara, A Pattar… – International Journal of …, 2016 – researchgate.net
… [5] A. Lee, T. Kawahara, and K. Shikano, “Julius- an open source real-time large vocabulary recognition engine “, In Proc. Eurospeech, (2001), pp. … [11] M. Nu?baum-Thom, AED Mousa, R. Schluter and H. Ney, “Compound Word Recombination for German LVCSR”, In Proc. …

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data
S Li, X Lu, S Mori, Y Akita… – … (ISCSLP), 2016 10th …, 2016 – ieeexplore.ieee.org
… On this stage, the training is based on the CE criterion, and sequential dis- criminative training is not conducted. For decoding, we use Julius ver.4.3.1 (DNN version3) [27] using the state transition probabilities of the GMM-HMM. …

Multimodal interaction with multiple co-located drones in search and rescue missions
J Cacace, A Finzi, V Lippiello – arXiv preprint arXiv:1605.07316, 2016 – arxiv.org
… Speech recognition: we rely on Julius[14], a two-pass large vocabulary continuous speech recognition (LVCSR) engine. A suitable grammar has been defined to parse the commands of the users. A N-best list of possible interpreta- tions is continuously provided in output. …

A control architecture for multiple drones operated via multimodal interaction in search & rescue mission
J Cacace, A Finzi, V Lippiello, M Furci… – Safety, Security, and …, 2016 – ieeexplore.ieee.org
… Speech recognition: we rely on Julius [15], a two-pass large vocabulary continuous speech recognition (LVCSR) engine. A suitable grammar has been defined to parse the commands of the users. A N-best list of possible interpreta- tions is continuously provided in output. …

Nonparametric bayesian double articulation analyzer for direct language acquisition from continuous speech signals
T Taniguchi, S Nagasaka… – IEEE Transactions on …, 2016 – ieeexplore.ieee.org
Page 1. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, VOL. 8, NO. 3, SEPTEMBER 2016 171 Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition From Continuous Speech Signals …

Melody Transcription Framework using Score Information for Noh Singing
K Itou, RC Repetto, X Serra – mtg.upf.edu
… pp. 303–310 [12] T. Kawahara, A. Lee, K. Takeda, K. Itou, and K. Shikano, “Recent progress of open-source LVCSR engine Julius and Japanese model repository,” Proceedings of ICSLP2004, 2004, pp. 3069–3072 Page 6. …

Arabic Speech to Arabic Sign Language Translation System
AM Hassan, AS El-Din, HA Barakat, HT Mohamed… – academia.edu
… 19 Fig 2.5 Julius 20 Fig 2.6 HTK 21 … HTK Hidden Markov Model Toolkit HCI Human Computer Interaction IVR Interactive Voice Response LVCSR Large vocabulary continuous speech recognition MFs Manual features MRHE Mohammed Bin Rasheed Housing Establishment …

Combining i-vector representation and structured neural networks for rapid adaptation
C Wu, P Karanasou, MJF Gales – Acoustics, Speech and Signal …, 2016 – ieeexplore.ieee.org
… Xue, Ossama Abdel-Hamid, Hui Jiang, and Lirong Dai, “Direct adaptation of hybrid dnn/hmm model for fast speaker adaptation in lvcsr based on … [27] Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying A Liu, Gareth Moore, Julian Odell, Dave …

A Control Architecture for Unmanned Aerial Vehicles Operating in Human-Robot Team for Service Robotic Tasks
J Cacace – 2016 – fedoa.unina.it
Page 1. UNIVERSIT `A DEGLI STUDI DI NAPOLI FEDERICO II SCUOLA DI DOTTORATO IN INGEGNERIA INFORMATICA ed AUTOMATICA DIPARTIMENTO DI INGEGNERIA ELETTRICA E TECNOLOGIE DELL’ INFORMAZIONE A Control Architecture for Unmanned Aerial …

Phone Labeling Based on the Probabilistic Representation for Dysarthric Speech Recognition
Y Takashima, T Nakashika, T Takiguchi… – American Journal of …, 2016 – article.sapub.org
… we attempted to recognize utterances using a speaker-independent acoustic model for unimpaired people (This model is included in Julius 1). The acoustic … [17], Karel Vesel`y, Martin Karafi´at, and Franti?sek Gr´ezl, “Convolutive bottleneck network features for LVCSR,” in ASRU …

Multilingvální rozpoznáva? telefonní ?e?i na bázi DNN-HMM
J Fiala – 2016 – dspace.cvut.cz
… Byla provedena anal?za dvou realizací akustického modelování v LVCSR: tj. byl pouûit stan- … architektura. Vlastní experimenty byly provedeny pro LVCSR s akustick?m modelem pro jednotlivé jazyky a pro mutilingvální systém. …

Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training
S Li – 2016 – repository.kulib.kyoto-u.ac.jp
Page 1. Title Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training( Dissertation_?? ) Author(s) Li, Sheng Citation Kyoto University (????) Issue Date 2016-03-23 URL https://dx.doi.org/10.14989/doctor.k19849 …

Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems’ hypotheses
S Li, Y Akita, T Kawahara – IEEE/ACM Transactions on Audio, Speech …, 2016 – dl.acm.org
… dence. In their experiments, the method yielded a good result in a low-resource LVCSR setting. … conducted. For decoding, we use Julius ver.4.3.1 (DNN version3) [47] using the state transition probabilities of the GMM-HMM. …

Development of a Voice-Controlled Human-Robot Interface
W Khaewratana – 2016 – search.proquest.com
… In their work, a voice decoder software Julian was used for the speech recognition system to recognize voice … A voice decoder software Julius was used for the ASR system, allowing a recognition of conversation in Japanese. … Thai broadcast news corpus and an LVCSR system. …

Exploiting Semantic and Topic Context to Improve Recognition of Proper Names in Diachronic Audio Documents
I Sheikh – 2016 – tel.archives-ouvertes.fr
… 38 2 Background 40 2.1 Large Vocabulary Continuous Speech Recognition . . . . . 40 2.1.1 LVCSR Acoustic Modelling . . . . . 44 2.1.2 LVCSR Language Modelling . . . . . 47 2.2 The Out-of-Vocabulary Problem in LVCSR . . . . . …

ASurvey OF MARKOV CHAIN MODELS IN LINGUISTICS APPLICATIONS
FSAAD AbuZeina – Computer Science & Information Technology – aircconline.com
… Page 69. Computer Science & Information Technology (CS & IT) 59 recognition (LVCSR) systems based on HMMs. … 62 Computer Science & Information Technology (CS & IT) [44] Kupiec, Julian.” Robust part-of-speech tagging using a hidden Markov model.” Computer Speech & …

Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1
M Courbariaux, I Hubara, D Soudry, R El-Yaniv… – arXiv preprint arXiv …, 2016 – arxiv.org
Page 1. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or ?1 Matthieu Courbariaux*1 MATTHIEU.COURBARIAUX@GMAIL.COM Itay Hubara*2 ITAYHUBARA@GMAIL.COM Daniel Soudry3 …

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords
K Domoto, T Utsuro, N Sawada… – … on Information and …, 2016 – search.ieice.org
… note-taking system [1]. However, STD is difficult to use when searching for terms within a vocabulary-free frame- work because search terms are not known by the STD pro- cess prior to implementing a large vocabulary continuous speech recognition (LVCSR) system. … Julius ver. …

Speech to Text for Swedish using KALDI
E Kullmann – 2016 – diva-portal.org
… HMM Hidden Markov Models. LDA Linear Discriminant Analysis. LM Language Model. LVCSR Large Vocabulary Continuous Speech Recognition. MFCC Mel-Frequency Cepstral Coefficient. MLLT Maximum Likelihood Linear Transform. SAT Speaker Adaptive Training. …

Using linguistic knowledge for improving automatic speech recognition accuracy in air traffic control
VN Nguyen – 2016 – brage.bibsys.no
… Select an ASR framework and an ATC-related corpus for training – I first review ten well-known ASR open source frameworks including Bavieca, CMU Sphinx, Hid- den Markov Model Toolkit (HTK), Julius, Kaldi, RWTH ASR, SPRAAK, CSLU Toolkit, The transLectures-UPV …

Achieving Automatic Speech Recognition for Swedish using the Kaldi toolkit
Z Mossberg – 2016 – diva-portal.org
… will refer to what is called Large Vocabulary Continuous Speech Recognition [LVCSR] and speaker independent systems. 3 Page 12. Background 4 … There is one other known ASR model created from NST for Swedish for HTK Julius …

Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or?
M Courbariaux, I Hubara, COMD Soudry, R El-Yaniv… – pdfs.semanticscholar.org
Page 1. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or ?1 Matthieu Courbariaux*1 MATTHIEU.COURBARIAUX@GMAIL.COM Itay Hubara*2 ITAYHUBARA@GMAIL.COM Daniel Soudry3 …

Towards deep learning on speech recognition for Khmer language
C Lim – 2016 – mospace.umsystem.edu
… (1.22) where M is the number of words in the word sequence W. When dealing with large vocabulary continuous speech recognition (LVCSR), a generic Viterbi decoding search mentioned above is usually not sufficient. Hence, …

Quantized neural networks: Training neural networks with low precision weights and activations
I Hubara, M Courbariaux, D Soudry, R El-Yaniv… – arXiv preprint arXiv …, 2016 – arxiv.org
Page 1. Quantized Neural Networks Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations Itay Hubara* itayh@campuse.technion.ac.il Department of Electrical Engineering Technion – Israel Institute of Technology Haifa, Israel …

Neural autoregressive distribution estimation
B Uria, MA Côté, K Gregor, I Murray… – Journal of Machine …, 2016 – jmlr.org
Page 1. Journal of Machine Learning Research 17 (2016) 1-37 Submitted 5/16; Published 9/16 Neural Autoregressive Distribution Estimation Benigno Uria benigno.uria@gmail.com Google DeepMind London, UK Marc-Alexandre Côté marc-alexandre.cote@usherbrooke.ca …

Data Driven Sample Generator Model with Application to Classification
AE Ulloa Cerna – 2016 – repository.unm.edu
Page 1. Candidate Department This thesis is approved, and it is acceptable in quality and form for publication: Approved by the Thesis Committee: , Chairperson Alvaro Emilio Ulloa Cerna Mathematics and Statistics Department Erik Erhardt Li Li Marios Pattichis Page 2. …

Low-Rank Representation For Enhanced Deep Neural Network Acoustic Models
G Luyet – 2016 – infoscience.epfl.ch
Page 1. TROPE R HCRAESE R PAID I LOW-RANK REPRESENTATION FOR ENHANCED DEEP NEURAL NETWORK ACOUSTIC MODELS Gil Luyet Idiap-RR-05-2016 MARCH 2016 Centre du Parc, Rue Marconi 19, PO Box …

Byte level language models
V Baisa – 2016 – is.muni.cz
Page 1. Byte Level Language Models A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Vít Baisa supervisor: doc. PhDr. Karel Pala, CSc. Brno, June 2016 Page 2. Page 3. Acknowledgement …

Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications
K Li, Z Zhou, CH Lee – ACM Transactions on Accessible Computing …, 2016 – dl.acm.org
Page 1. 7 Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications KEHUANG LI, Georgia Institute of Technology ZHENGYU ZHOU, Research and Technology …

Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching
C Raffel – 2016 – search.proquest.com
Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. Abstract. Sequences of feature vectors are a natural way of representing temporal data. Given a database of sequences …