
Notes:
SRILM (SRI Language Modeling Toolkit) is an open source, extensible language modeling toolkit. SRILM is a C++-based toolkit for language modeling. Language models are built and interpolated using the SRILM. SRILM can be used for building local language models. SRILM is used to estimate n-grams Language Models (LM). SRILM has an API for computing word language model probabilities. Disambig is one module in SRILM. Perplexity values can be computed with SRILM. There is a standard script for “compute – best – mix” in the SRILM package. The LM weighted using SRILM has been used to train language models. With an SRILM extension, efficient estimation of maximum entropy language models with n-gram features can be achieved. Even with relatively small language models, SRILM can be used to prune the language models using an entropy criterion. N-gram models may be estimated for all of the possible combinations using SRILM. SRILM can be used to build n-gram ARPA format language models. SRILM reads and writes to a standard ARPA (Advanced Research Projects Agency) file format for n-gram models. Standard n-gram language models may be trained with the SRILM using interpolated modified Kneser-Ney smoothing. SRILM can be used to build bigram language models from various corpora, such as the English Gigaword corpus. SRILM can be used on a monolingual training corpus of 48,000,000 sentences, for example.
A bigram language model used in recognition systems was generated using the SRILM with the modified Kneser-Ney back-off discounting. Trigram LMs may be estimated using the SRILM employing the default Good-Turing discounting method. The language model is a capitalization-invariant tri-gram language model with Good-Turing discounting acquired from the training corpus using the SRI language modeling toolkit. Modified KN models may be estimated on training set count files and applied to the test set using SRILM. A 4-gram target LM with unmodified Kneser-Ney backoff discounting was generated using the SRILM. SRILM was used to train a 5-gram language model on the English sentences of FBIS (Foreign Broadcast Information Service) corpus. A 5-gram language model generated by the SRILM can be used in the cube-pruning process. VMM (variable memory modeling) may be implemented within SRILM and compared to default N-Gram models. SRILM can also be used to train a 7-gram model on training set. For instance, SRILM may be used to estimate individual language models for truthful and deceptive opinions. Translation models and generation models may be trained by the Moses toolkit. IRSTLM is another, similar language modeling toolkit. N-gram language models may be scored using z-scores. For example z-scores have been used to compare documents by examining how many standard deviations each n-gram differs from its mean occurrence in a large collection, or text corpus, of documents (which form the “background” vector).
- ZMERT
Resources:
Wikipedia:
- Code-switching
- Grammar induction (aka grammatical inference)
- Language model
- Minimal recursion semantics (aka MRS)
- Moses (machine translation)
- Syntactic pattern recognition (aka structural pattern recognition)
References:
- Language and Computers (2012)
- Spoken Language Understanding: Systems for Extracting Semantic Information from Speech (2011)
See also:
IRSTLM (IRST Language Modeling) Toolkit 2018 | Kaldi ASR
TF-LM: TensorFlow-based Language Modeling Toolkit
L Verwimp, P Wambacq – … of the Eleventh International Conference on …, 2018 – aclweb.org
… If we generate a debugging file for a 5-gram LM with inter- polated modified Kneser-Ney smoothing with the SRILM toolkit, we can automatically define the optimal interpola- tion weights on the validation set, which are 0.24 for the n- gram model and 0.76 for the LSTM model …
The fifth’CHiME’Speech Separation and Recognition Challenge: Dataset, task and baselines
J Barker, S Watanabe, E Vincent, J Trmal – arXiv preprint arXiv …, 2018 – arxiv.org
… The language model is selected automatically, based on perplexity on training data, but at the time of the writing, the selected LM is 3-gram trained by the MaxEnt modeling method as imple- mented in the SRILM toolkit [35–37] …
Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
J Guglani, AN Mishra – International Journal of Speech Technology, 2018 – Springer
… One can easily implement N-gram model using the IRSTLM or SRILM toolkit which is included in their recipe (Lee et al. 2001) … Since Kaldi uses a FST-based framework to build LM model from the raw text, the SRILM toolkit is used …
Improved training of end-to-end attention models for speech recognition
A Zeyer, K Irie, R Schlüter, H Ney – arXiv preprint arXiv:1805.03294, 2018 – arxiv.org
… found 0.23 and 0.36 to be optimal respectively for Switchboard and LibriSpeech (the weight on the attention model is 1). For LibriSpeech, we also train Kneser-Ney smoothed n- gram count based language models [53] on the same BPE vo- cabulary set using SRILM toolkit [54] …
PADIC: extension and new experiments
K Meftouh, S Harrat, K Smaïli – 2018 – hal.archives-ouvertes.fr
… We used GIZA++ [15] for alignment and SRILM toolkit [16] to compute trigram language models using Kneser-Ney smoothing technique … Language modeling software such as the SRILM toolkit we used [16] allows the interpolation of these language models …
A comparison of language model training techniques in a continuous speech recognition system for Serbian
B Popovi?, E Pakoci, D Pekar – International Conference on Speech and …, 2018 – Springer
… The baseline language model is a 3-gram model trained on the training part of the database transcriptions and the Serbian journalistic corpus (about 600000 utterances), using the SRILM toolkit and the Kneser-Ney smoothing method, with a pruning value of 10 ?7 (previous …
Integration of machine translation in on-line multilingual applications: Domain adaptation
M? Duma, C Vertan – Language technologies for a multilingual Europe, 2018 – oapen.org
… scores are computed. For the language model training, we chose the srilm toolkit, 5 which is also open-source. It builds statis- tical language models and it also offers the possibility of interpolating language models. As for the …
Asr performance prediction on unseen broadcast programs using convolutional neural networks
Z Elloumi, L Besacier, O Galibert… – … , Speech and Signal …, 2018 – ieeexplore.ieee.org
… 3323M words in total – from EUbookshop, TED2013, Wit3, Glob- alVoices, Gigaword, Europarl-v7, MultiUN, OpenSubti- tles2016, DGT, News Commentary, News WMT, LeMonde, Trames, Wikipedia and transcriptions of our TrainAcoustic dataset) using SRILM toolkit [15] …
Enhancing recurrent neural network-based language models by word tokenization
HM Noaman, SS Sarhan… – … -centric Computing and …, 2018 – biomedcentral.com
Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks capabilities. Generally …
Varying Background Corpora for SMT-Based Text Normalization
CM Veliz, O De Clercq, V Hoste – of the 6th Conference on … – repository.uantwerpen.be
… For building the SMT model we used Moses (Koehn et al., 2007). All LMs were built using the SRILM toolkit (Stolcke, 2002) with Witten-Bell discounting which has proven to work well on small data sets (Tiedemann, 2012). We …
A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A Preliminary Study on Time Delay Neural Network
H Park, D Lee, M Lim, Y Kang, J Oh, JH Kim – arXiv preprint arXiv …, 2018 – arxiv.org
… windows of 25ms length. In addition, 100 dimensional iVectors were added to the MFCC input. The iVector presents speaker characteristics The SRILM toolkit [8] was used to generate tri-gram language model using frequency cut …
Intelligent Voice ASR system for Iberspeech 2018 Speech to Text Transcription Challenge.
N Dugan, C Glackin, G Chollet… – …, 2018 – pdfs.semanticscholar.org
… transcription challenge, training and development. These transcriptions are also used for the language model (LM) adaptation of the previous ASR model using the SRILM toolkit [6]. 2. Data preparation It was observed that the …
Robust Network Structures for Acoustic Model on CHiME5 Challenge Dataset
A Misbullah – Proc. CHiME 2018 Workshop on Speech Processing …, 2018 – isca-speech.org
… approximately. 3.3. Experimental Result In our experiment, we first evaluated the acoustic model us- ing 3-gram language model. The language model is trained by using SRILM toolkit 4 from training corpus transcription. In …
Survey on Statistical and Semantic Language Modelling Based on PolEval
K Wo?k – Proceedings ofthePolEval2018Workshop – 2018.poleval.pl
… for other languages. 3. Toolkits Used in the Research For language model training we firstly used the most common SRILM toolkit (Stolcke 2002). The fundamental challenge that language models handle is sparse data. It is …
IIT (BHU) Varanasi at MSR-SRST 2018: A Language Model Based Approach for Natural Language Generation
A Chawla, A Sharma, S Singh, AK Singh – researchgate.net
… sentence. For this, we make use of the SRILM Toolkit (Stolcke, 2002). Before … line. 2. After we have the vocab file with us, we make use of this and the ordered sentence data to generate a .lm file using the SRILM toolkit. This …
The AFRL IWSLT 2018 Systems: What Worked, What Didn’t
B Ore, E Hansen, K Young, G Erdmann, J Gwinnup – 2018 – apps.dtic.mil
… network (RNN) LM. Interpolated bigram, trigram, and 4-gram LMs were estimated using the SRILM Toolkit,1 and a RNN maximum entropy LM was trained using the RNNLM Toolkit.2 The RNN included 160 hidden units, 1http://www …
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages.
N Fathima, T Patel, C Mahima, A Iyengar – Interspeech, 2018 – isca-speech.org
… was provided by the challenge organizers. The SRI Language Modeling (SRILM) toolkit [9] was used to train Kneser-Ney smoothed trigram LMs on the training text data of each language. The Lexicon for each language uses …
GTM-IRLab Systems for Albayzin 2018 Search on Speech Evaluation.
P Lopez-Otero, LD Fernández – IberSPEECH, 2018 – isca-speech.org
… Specifically, two fourgram-based language models were trained following the Kneser-Ney dis- counting strategy using the SRILM toolkit [15], and the final LM was obtained by mixing both LMs using the SRILM static n-gram interpolation functionality …
Translation of Biomedical Documents with Focus on Spanish-English
MS Duma, W Menzel – Proceedings of the Third Conference on Machine …, 2018 – aclweb.org
… We used the SRILM toolkit (Stolcke, 2002) and Kneser-Ney discounting (Kneser and Ney, 1995) for estimat- ing 5-grams LMs. All the experiments benefited from the interpolated language model, including the strong baseline and the MML experiment …
Extended Language Modeling Experiments for Kazakh
B Myrzakhmetov, Z Kozhirbayev – ???????? ???? ?????????? …, 2018 – en.telconf.tatar
… experiments. In Kazakh, n-gram based language models still used in Speech Processing [15] and Machine translation [16] tasks. We trained n-gram models with the SRILM toolkit [17] with adding 0 smoothing technique. For …
Exploiting Parts-of-Speech for Improved Textual Modeling of Code-Switching Data
G Sreeram, R Sinha – 2018 Twenty Fourth National Conference …, 2018 – ieeexplore.ieee.org
… of the proposed approach. Also, the performances for the 5- gram LMs trained on Hindi and Hinglish data using SRILM toolkit [23] are computed for the reference purpose. IV-B. Parameter tuning RNN-based language models …
The NECTEC 2015 Thai Open-Domain Automatic Speech Recognition System
P Sertsi, S Kasuriya, P Chootrakool… – Advances in Natural …, 2018 – Springer
… 4] plus a silence. N-gram language models with Chen and Goodman’s modified Kneser-Ney discounting were constructed from the overall text data presented in the Table 1 using the SRILM toolkit [15]. The number of unique …
Style Transfer Through Multilingual and Feedback-Based Back-Translation
S Prabhumoye, Y Tsvetkov, AW Black… – arXiv preprint arXiv …, 2018 – arxiv.org
… tasks. 3.4 Experimental Setup We used data from Workshop in Statistical Machine Translation 2015 (WMT15) (Bojar et al., 2015) and sequence-sequence framework 2We use the SRILM toolkit (Stolcke, 2002) Page 4. Model …
Unsupervised domain adaptation by adversarial learning for robust speech recognition
P Denisov, NT Vu, MF Font – Speech Communication; 13th ITG …, 2018 – ieeexplore.ieee.org
… For decoding we also trained two 3-gram language models on the transcripts from the training data and on the CommonCrawl subset and inter- polated them with SRILM toolkit [32]. The perplexity of the language model on our testing data set is 209.47 …
New baseline in automatic speech recognition for Northern Sámi
J Leinonen, P Smit, S Virpioja, M Kurimo – Proceedings of the Fourth …, 2018 – aclweb.org
… A similar process was used to train the BLSTM and Chain model to generate networks with seven and six layers respectively. For a word-based system, we trained a Kneser-Ney smoothed 3-gram model with the SRILM toolkit (Stolcke, 2002) …
A dataset for document grounded conversations
K Zhou, S Prabhumoye, AW Black – arXiv preprint arXiv:1809.07358, 2018 – arxiv.org
… responses. The test presents the chat history (1 utterance) and then, in random 2The total number of tokens is 46000, and we limit the vocabulary to be 10000 tokens. 3We use the SRILM toolkit (Stolcke, 2002) Page 5. order, its …
Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages.
A Baby, K Pandia, HA Murthy – SLTU, 2018 – isca-speech.org
… The syllables are combined later to obtain word boundaries. 3.3. Language modeling The SRILM toolkit is used to train the language model [32]. 4-gram models are learned to build the language model. 3.4. GMM based training …
Slovak broadcast news speech recognition and transcription system
M Lojka, P Viszlay, J Staš, D Hládek, J Juhár – International Conference on …, 2018 – Springer
… decision. 4.2 Language Modeling. The background language model was created using the SRILM toolkit [13]. It was restricted to the vocabulary size of about 500 thousand unique words and smoothed by the Witten-Bell algorithm …
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech.
A Biswas, F de Wet, E van der Westhuizen, E Yilmaz… – Interspeech, 2018 – dsp.sun.ac.za
… to the development and test sets. The SRILM toolkit [33] was used to train and evalu- ate a bilingual English-isiZulu language model (LM) using the English-isiZulu training set transcriptions. This model was fur- ther interpolated …
Lexical Networks in! Xung and Ju
SA Hussain – 2018 – kb.osu.edu
… We train the trigram models using the KenLM Language Model Toolkit before using the SRI Language Modeling (SRILM) Toolkit to generate the simulated words (Heafield, 2011; Stolcke, 2002; Stolcke et. al. 2011). We generate pseudolexicons of size …
Development Of High-Performance And Large-Scale Vietnamese Automatic Speech Recognition Systems
DQ Truong, PN Phuong, TH Tung, LC Mai – Journal of Computer Science …, 2018 – vjs.ac.vn
… SCALE VIETNAMESE 345 We used SRILM toolkit for modeling training and used perplexity for evaluating the performance. Various n-gram models have been conducted including 3-gram, 4-gram, and their pruned versions. The …
Impact of ASR Performance on Free Speaking Language Assessment.
K Knill, MJF Gales, K Kyriakopoulos, A Malinin… – …, 2018 – apc38.user.srcf.net
… as described in [10]. A Kneser-Ney trigram LM is trained on 186k words from the System 1 training data, and interpolated with a general LM trained on Broadcast News English [34], us- ing the SRILM toolkit [35]. A 334 hours …
The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task.
D Jülg, M Kunstek, CP Freimoser, K Berkling… – Interspeech, 2018 – researchgate.net
… The language model used in the baseline ASR is a trigram language model trained on all the text of ST1 train using the SRILM toolkit [13]. The described system is provided with the task and is based on the best system for the Shared Task Ed.1 competition ASR component [14] …
A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model.
G Sreeram, R Sinha – Interspeech, 2018 – isca-speech.org
… By conducting tuning ex- periments on Hindi test data, the parameter corresponding to the number of classes is set to be 50 and the variable corresponding to backpropagation through time (BPTT) is set as 5. Also, the 5-gram LM is trained using the SRILM toolkit [30] by setting …
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018.
HB Sailor, MVS Krishna, D Chhabra, AT Patil… – Interspeech, 2018 – isca-speech.org
… D cepstral features. The LDA-MLLT is applied to reduce the dimension and decorrelate the context-based cepstral features. The 3-gram LM is built us- ing the SRILM toolkit [28] from the training corpus. The align- ments obtained …
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs.
K Irie, Z Lei, L Deng, R Schlüter, H Ney – Interspeech, 2018 – isca-speech.org
… word Switchboard training data mentioned in Sec. 4.1. using SRILM toolkit [22]. We use this model for decoding and apply the neural language model in the second pass rescoring. The application of forward and backward LSTM …
Automatic Speech Recognition for Humanitarian Applications in Somali
R Menon, A Biswas, A Saeb, J Quinn… – arXiv preprint arXiv …, 2018 – arxiv.org
… The incorporation of such ar- tificial data has successfully improved speech recognition per- formance for other authors [13]. The language model for Somali was generated using the SRILM toolkit [14]. When training only on the language model …
Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer.
S Feng, T Lee – Interspeech, 2018 – isca-speech.org
… validation set. Syllable error rates (SERs) of CA test set is chosen for evaluation. A syllable trigram lan- guage model trained with transcriptions of CA training data is used during decoding, using SRILM toolkit [34]. SERs of baseline …
On Continuous Speech Recognition of Indian English
X Jin, K Zhang, X Huang, M Miao – Proceedings of the 2018 International …, 2018 – dl.acm.org
… ACAI’18, December, 2018, Sanya, China The experiment is mainly built with Kaldi toolkit [13][15]. The language model is constructed with SRILM toolkit [14] and a 3- gram language model is obtained from the training of tagged text corpus …
Classification of closely related sub-dialects of Arabic using support-vector machines
S Wray – Proceedings of the Eleventh International Conference …, 2018 – aclweb.org
… Classifica- tion was performed on each tweet individually. To generate n-gram probabilities used as features for the SVM, I used the SRI Language Modeling (SRILM) toolkit 3672 Page 3. (Stolcke and others, 2002) encompassing the following …
Progress and tradeoffs in neural language models
R Tang, J Lin – arXiv preprint arXiv:1811.00942, 2018 – arxiv.org
… Both QRNN models have window sizes of r = 2 for the first layer and r = 1 for the rest. For the KN-5 model, we trained an off-the- shelf five-gram model using the popular SRILM toolkit (Stolcke, 2002). We did not specify any special hyperparameters. 3.2 Infrastructure …
Automatic Identification of Moroccan Colloquial Arabic
SL Aouragh, H Jaafa – … Processing: From Theory to Practice: 6th …, 2018 – books.google.com
… The dataset was splitted to train and test set. Then, they used SRILM toolkit (Andreas 2002) to build a language model where the goal is to find the best sequence of tags for a given sentence. By using MADAMIRA morphological analyzer (Pasha et al …
Role-specific Language Models for Processing Recorded Neuropsychological Exams
T Al Hanai, R Au, J Glass – Proceedings of the 2018 Conference of the …, 2018 – aclweb.org
… Language Model: A tri-gram language model was trained for each of the speaker and tester using the SRILM toolkit (Stolcke et al., 2002). • Lexicon: We generated the word pronuncia- tions using the LOGIOS lexical tool1. We decoded the audio in three ways …
Investigating the Use of Mixed-Units Based Modeling for Improving Uyghur Speech Recognition.
P Hu, S Huang, Z Lv – SLTU, 2018 – isca-speech.org
… this corpus is used for vocabulary selection for mixed units and to estimate back-off N-gram LMs using modified Kneser-Ney smoothing by the SRILM toolkit[27] To evaluate the performance, two test sets are prepared for the Uyghur speech recognition task …
Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models
M Hentschel, M Delcroix, A Ogawa… – 2018 Asia-Pacific …, 2018 – ieeexplore.ieee.org
… 27]. D. Penn Treebank Results First, we will show PPL results for the validation and test sets of PTB. As a baseline N-gram LM, we estimated a trigram LM with Kneser-Ney [28] smoothing using the SRILM toolkit [29]. However …
MLLP-UPV and RWTH Aachen Spanish ASR Systems for the IberSpeech-RTVE 2018 Speech-to-Text Transcription Challenge.
J Jorge, AA Martinez-Villaronga, P Golik, A Giménez… – …, 2018 – isca-speech.org
… Third, we trained two standard Kneser-Ney smoothed 4- gram LMs on the train and subs-C24H sets using the SRILM toolkit [9]. Rows (a) and (b) of Table 3 show the perplexities obtained with these models on the dev1-dev and dev2 sets …
Combined Speaker Clustering and Role Recognition in Conversational Speech.
N Flemotomos, P Papadopoulos, J Gibson… – Interspeech, 2018 – isca-speech.org
… The train set is only used to build the LMs and AMs described in Section 2.3 corresponding to the different roles. The LMs are 3-gram models trained (and later evaluated) using the SRILM toolkit [28] with manually derived transcrip- tions of the recordings …
Improving ASR for Code-Switched Speech in Under-Resourced Languages Using Out-of-Domain Data.
A Biswas, E van der Westhuizen, T Niesler, F de Wet – SLTU, 2018 – dsp.sun.ac.za
… code-switched adaptation. 5. Language modelling The SRILM toolkit [23] was used to train and evaluate a bilin- gual 3-gram language model trained on the English-isiZulu training data transcriptions. This language model was …
CLMAD: A Chinese Language Model Adaptation Dataset
Y Bai, J Tao, J Yi, Z Wen, C Fan – 2018 11th International …, 2018 – ieeexplore.ieee.org
… Standard trigram backoff language models with Kneser-Ney discount methods are trained on training set of each domain, and perplexities on each testing set are computed with these models. All models are trained with SRILM toolkit [19] …
LSTM language model adaptation with images and titles for multimedia automatic speech recognition
Y Moriya, GJF Jones – 2018 IEEE Spoken Language …, 2018 – ieeexplore.ieee.org
… Kaldi [21]. The acoustic model was trained with the nnet3 module, and the n-gram language model for decoding word graphs was 3-gram with modified Kneser-Ney interpo- lation using the SRILM toolkit [22, 23, 24]. The validation …
Follow-up Question Generation Using Pattern-based Seq2seq with a Small Corpus for Interview Coaching.
MH Su, CH Wu, KY Huang, QB Hong, HH Huang – Interspeech, 2018 – isca-speech.org
… 1008 Page 4. question sentence pattern according to the constructed word class table. As there are many candidate questions after word filling, this study uses the n-gram SriLM toolkit to choose the best question as the interviewer’s question …
Rapid Collection of Spontaneous Speech Corpora Using Telephonic Community Forums.
AA Raza, A Athar, S Randhawa, Z Tariq, MB Saleem… – Interspeech, 2018 – zaintq.com
… 5.2. Language Model and Pronunciation Lexicon We use a trigram language model with Kneser-Ney discounting, based on training transcripts, built using SRILM toolkit [46]. Our LM has 74K tokens (5K types), an OOV rate of 3.64% and perplexity of 37.04 on test data …
The University of Birmingham 2018 Spoken CALL Shared Task Systems.
M Qian, X Wei, P Jancovic, MJ Russell – Interspeech, 2018 – research.birmingham.ac.uk
… We used a trigram language model (LM) trained on the reference transcriptions of the ST data using the SRILM toolkit [9]. The LM1 denotes model obtained based on the ref- erence transcriptions of ST12 train and used during the ASR development …
Disfluency detection using a noisy channel model and a deep neural language model
PJ Lou, M Johnson – arXiv preprint arXiv:1808.09091, 2018 – arxiv.org
… Page 5. smoothed 4-gram language models with the LSTM corresponding on the reranking process of the noisy channel model. We estimate the 4- gram models and assign probabilities to the flu- ent parts of disflueny analyses using the SRILM toolkit (Stolcke, 2002) …
Decipherment for Adversarial Offensive Language Detection
Z Wu, N Kambhatla, A Sarkar – Proceedings of the 2nd Workshop on …, 2018 – aclweb.org
… 4 Experimental Setup 4.1 Language Model Character Language Model: We used the SRILM toolkit (Stolcke et al., 2002) to train a character language model (LM) from Wiktionary and Europarl data. We trained two LMs and in- terpolated them using a mixture model …
Automatic speech recognition system for people with speech disorders
ME Ramaboka – 2018 – ulspace.ul.ac.za
… stage. The cepstral mean combined variance normalization (CMVN) was applied to normalise the features. A third-order language model was trained using the SRI Language Modelling (SRILM) toolkit. A recognition accuracy of 65.58% was obtained …
Mixing Textual Data Selection Methods for Improved In-Domain Data Adaptation
K Wo?k – World Conference on Information Systems and …, 2018 – Springer
… The sizes of the perplexity-based quasi-in-domain subsets must be equal. In practice, we work with the SRI Language Modeling (SRILM) toolkit to train 5-gram LMs with interpolated modified Kneser–Ney discounting [17, 18]. 3.3 Levenshtein Distance …
Probabilistic Indexing and Search for Information Extraction on Handwritten German Parish Records
E Lang, J Puigcerver, AH Toselli… – 2018 16th International …, 2018 – ieeexplore.ieee.org
… geometric character boundaries. As discussed in [9] (see also [10]), it provides a good approximation to the joint probability distribution p(c, x). 4With the SRILM Toolkit: http://www.speech.sri.com/projects/srilm Finally, following the …
Automatic machine translation for arabic tweets
F Mallek, NT Le, F Sadat – Intelligent Natural Language Processing …, 2018 – Springer
… 36]. Once the lexical normalization step was done, we obtained a corpus of tweets in English ready for building the LM using the SRILM toolkit [45]. Pre … corpus). The LMs are 3-gram LM, generated with the SRILM toolkit [45]. The …
Comparing different feedback modalities in assisted transcription of manuscripts
CD Martínez-Hinarejos… – 2018 13th IAPR …, 2018 – ieeexplore.ieee.org
… manually revised. B. System setup The different recognition systems were implemented by using the iATROS recogniser [18], and the SRILM toolkit [19] was used to transform the WG recognition outputs into CN. 1) Features …
Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling.
S Feng, T Lee – Interspeech, 2018 – isca-speech.org
… bottleneck layer. A syllable trigram lan- guage model trained with transcriptions of CUSENT training data is used during decoding. The language model is trained with SRILM toolkit [41]. 6.3. Speaker adaptation of target speech The …
Role Annotated Speech Recognition for Conversational Interactions
N Flemotomos, Z Chen, DC Atkins… – 2018 IEEE Spoken …, 2018 – ieeexplore.ieee.org
… 6: Distribution of the duration of the intervals between speaker change points in the MI dataset. The LMs are 3-gram models with Kneser-Ney smooth- ing, trained with the SRILM toolkit [29]. The training corpus can be created either by concatenating consecutive turns as in Fig …
Disambiguation of verbal shifters
M Wiegand, S Loda, J Ruppenhofer – Proceedings of the Eleventh …, 2018 – aclweb.org
… This is a standard configuration proven to yield good results (Turian et al., 2010). We induce the clusters using the SRILM-toolkit (Stolcke, 2002). Word Embeddings. A more recent alternative to Brown clustering is the usage of word embeddings …
Cross-Lingual Content Scoring
A Horbach, S Stennmanns, T Zesch – … on Innovative Use of NLP for …, 2018 – aclweb.org
… language mod- els. We build a trigram language model per prompt for the English data using the SRILM toolkit (Stol- cke, 2002) and measure the perplexity of trans- lated German answers under that language model. We find …
Improved Spoken Uyghur Segmentation for Neural Machine Translation
C Mi, Y Yang, X Zhou, L Wang… – 2018 IEEE 30th …, 2018 – ieeexplore.ieee.org
… morphological features are extracted with an in-house CRF based Uyghur morphological analyzer; we derive the bilingual features based on the widely used word alignment tool GIZA++ 3; Uyghur monolingual language model features are extracted by the SRILM toolkit [18] …
A comparison of different punctuation prediction approaches in a translation context
V Vandeghinste, L Verwimp, J Pelemans… – Proceedings …, 2018 – lirias.kuleuven.be
… on English. The n-gram models are 4-gram LMs (5-grams did not improve the performance) with interpolated modified Kneser-Ney smoothing (Chen and Good- man, 1999), trained with the SRILM toolkit (Stol- cke, 2002). We …
Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings
J Juhár – … Technology. Challenges for Computer Science and …, 2018 – books.google.com
… Page 69. 52 J. Sta?s et al. 4.4 Language Modeling for Speech Recognition The background LM was created by using the SRILM Toolkit [21]. It was restricted to the vocabulary size of 500 thousand unique words and smoothed with the Witten-Bell back-off algorithm …
Athena: Automated tuning of genomic error correction algorithms using language models
M Abdallah, A Mahgoub, S Bagchi… – arXiv preprint arXiv …, 2018 – arxiv.org
… 3 Evaluation with Real Datasets In this section, we evaluate Athena variants separately by correcting errors in 5 real datasets and evaluating the quality of the resultant assembly. We implement the N-Gram model using the SRILM toolkit [16] …
Parallel Corpora for bi-Directional Statistical Machine Translation for Seven Ethiopian Language Pairs
ST Abate, M Melese, MY Tachbelie… – Proceedings of the First …, 2018 – aclweb.org
… SRILM toolkit (Stolcke, 2002) has been used to develop the language models using target language sentences from the training and tuning sets of parallel corpora. Bilingual Evaluation Under Study (BLEU) is used for automatic scoring. 5.2 Experimental Results …
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
KD Chowdhury, M Hasanuzzaman, Q Liu – Proceedings of the Workshop …, 2018 – aclweb.org
… We use the SRILM toolkit (Stolcke, 2002) for building a language model and GIZA++ (Och and Ney, 2000) with the grow-diag-final-and heuristic for extracting phrases from Hic – Enc .The trained system is tuned using Minimum Error Rate Training (Och, 2003) …
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
K Dutta Chowdhury, M Hasanuzzaman, Q Liu – 2018 – doras.dcu.ie
… We use the SRILM toolkit (Stolcke, 2002) for building a language model and GIZA++ (Och and Ney, 2000) with the grow-diag-final-and heuristic for extracting phrases from Hic – Enc .The trained system is tuned using Minimum Error Rate Training (Och, 2003) …
Integrating pronunciation into Chinese-Vietnamese statistical machine translation
AT Huu, H Huang, Y Guo, S Shi… – Tsinghua Science and …, 2018 – ieeexplore.ieee.org
… A 5-gram language model is estimated using the SRILM toolkit[8]. The rest of the parameters are the default settings provided by Moses. A conversation corpus is used as the dataset for the experiments. This corpus includes 550000 sentence pairs …
Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge.
C Montacié, MJ Caraty – Interspeech, 2018 – isca-speech.org
… Each language model has been trained on the intonation contours of the Train set corresponding to its class. The three language models (LM_low, LM_medium and LM_high) have been computed using SRILM toolkit [44]. This …
Thank “Goodness”! A Way to Measure Style in Student Essays
S Mathias, P Bhattacharyya – Proceedings of the 5th Workshop on …, 2018 – aclweb.org
… 4.4 Language modeling features These are language modeling features of the es- say using the English Wikipedia from the Leipzig corpus (Goldhahn et al., 2012). These features are the output from the SRILM toolkit (Stolcke et al., 2002). We use the following features …
Joint Word-and Character-level Embedding CNN-RNN Models for Punctuation Restoration
MÁ Tündik, G Szaszák – 2018 9th IEEE International …, 2018 – ieeexplore.ieee.org
… The language model for the ASR was trained on the corpus used for the punctuation model with the SRILM toolkit [19]. The deep neural network based acoustic models were trained on 500+ hours of transcribed speech using the Kaldi ASR toolkit [20] …
Offline Arabic handwriting recognition using BLSTMs combination
SK Jemni, Y Kessentini, S Kanoun… – 2018 13th IAPR …, 2018 – ieeexplore.ieee.org
… This is given by: 1 1 1 1 ( | ) ( | i i i i i N Pw h Pw h (2) In this work, n-gram (n=3) LMs are estimated on the training corpus of the KHATT database using the discounting method. SRILM toolkit [20] was used for this purpose. III …
Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems
D Castro, BLD Bezerra… – 2018 16th International …, 2018 – ieeexplore.ieee.org
… HMM scheme, as outlined in Section III. We used a standard language model setup [6], [22], [5], [7], [8]. The SRILM toolkit [23] is used for training interpolated n-gram language models. A smoothed word tri-gram language model …
Lightly supervised alignment of subtitles on multi-genre broadcasts
O Saz, S Deena, M Doulaty, M Hasan, B Khaliq… – Multimedia Tools and …, 2018 – Springer
… the subtitles. In this work, this is achieved using the SRILM toolkit [40] and biases the decoder and the lattice rescoring towards producing hypotheses which are closer to the words and language used in the subtitles. Such interpolation …
Lexical Networks in! Xung
SA Hussain, M Elsner, A Miller – Proceedings of the Fifteenth Workshop …, 2018 – aclweb.org
… simulating “words” similar to those in the actual language. We train the trigram models using the SRI Lan- guage Modeling (SRILM) Toolkit (Stolcke, 2002; Stolcke et al., 2011). We generate pseudolexicons of size 210, 211, 212 …
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment
Y Wang, JHM Wong, MJF Gales… – 2018 IEEE Spoken …, 2018 – ieeexplore.ieee.org
… The models were trained on the com- bined crowd-sourced transcriptions. An in-domain LM was trained on 1.83M words from the combined crowd-sourced transcriptions of the training data, using the SRILM toolkit [38]. This …
Statistical Approach to Noisy-Parallel and Comparable Corpora Filtering for the Extraction of Bi-lingual Equivalent Data at Sentence-Level
K Wo?k, E Zawadzka, A Wo?k – World Conference on Information Systems …, 2018 – Springer
… target specific domain. The size of the perplexity-based quasi in-domain subsets must be equal. In practice, we use the SRILM toolkit to train 5-g LMs using interpolated modified Kneser-Ney discounting [38, 39]. In the realm …
Distilling GRU with Data Augmentation for Unconstrained Handwritten Text Recognition
M Liu, Z Xie, Y Huang, L Jin… – 2018 16th International …, 2018 – ieeexplore.ieee.org
… It is noteworthy for general-purpose recognition and fair comparison with previous works [2]–[4], our system had 7356 classes. For language modeling, we constructed a 3-gram statistical language model using the SRILM toolkit [25] …
English-Wolaytta Machine Translation Using Statistical Approach
M MARA – 2018 – repository.smuc.edu.et
… translation approach, we have used most popular and freely available SMT tools such as: SRILM toolkit for language model, MGIZA++ align the corpus at word level by using IBM models (1-5). Decoding has been done using Moses, which a statistical machine translation …
Lightly supervised alignment of subtitles on multi-genre broadcasts
O Saz Torralba, S Deena, M Doulaty… – Multimedia …, 2018 – eprints.whiterose.ac.uk
… the subtitles. In this work, this is achieved using the SRILM toolkit [40] and biases the decoder and the lattice rescoring towards producing hypotheses which are closer to the words and language used in the subtitles. Such interpola …
Stacked Neural Networks With Parameter Sharing For Multilingual Language Modeling
BK Khonglah, S Madikeri, N Rekabsaz, N Pappas… – navid-rekabsaz.com
… the ASR systems. SRILM toolkit is used to create N-gram language models. The neural language models are implemented in pytorch. A modified version of TDNN, released in [13] is used in our ex- periments. We also implement …
Uniform Information Density Effects on Syntactic Choice in Hindi
A Jain, V Singh, S Ranjan, R Rajkumar… – Proceedings of the …, 2018 – aclweb.org
… In total, our dataset consisted of 8736 reference sentences and 175801 variants. We estimated lexical surprisal using trigram models trained on 1 million Hindi sentences from EMILLE Corpus (Baker et al., 2002) using the SRILM toolkit (Stolcke, 2002) …
Experimenting with lipreading for large vocabulary continuous speech recognition
K Pale?ek – Journal on Multimodal User Interfaces, 2018 – Springer
… Inclusion of the words from the TULAVD corpus only ensures that the test data will not contain any previously unseen words. We employed the SRILM toolkit [26] with Knesser-Nay smoothing for the language model training. Open image in new window Fig. 6. Fig. 6 …
A neural reordering model based on phrasal dependency tree for statistical machine translation
S Farzi, H Faili, S Kianian – Intelligent Data Analysis, 2018 – content.iospress.com
Machine translation is an important field of research and development. Word reordering is one of the main problems in machine translation. It is an important factor of quality and efficiency of machine translations and becomes more difficult when it.
Morphology In Statistical Machine Translation From English To Highly Inflectional Language
MS Mau?ec, G Donaj – Information Technology and Control, 2018 – itc.ktu.lt
… A 3-gram language model with modified Kneser-Ney discounting was built on the training cor- pus by the SRILM toolkit [26]. Singletons were exclud- ed. The perplexity of Slovenian language model was 131, and that of English was 62 …
Discriminative ridge regression algorithm for adaptation in statistical machine translation
M Chinea-Rios, G Sanchis-Trilles… – Pattern Analysis and …, 2018 – Springer
… The language model used was a 5 g with modified Kneser–Ney smoothing [11], built with the SRILM toolkit [28]. The translation quality would ideally be measured by humans. However, this is a very expensive resource, not commonly available in research tasks …
A novel rule based machine translation scheme from Greek to Greek Sign Language: Production of different types of large corpora and Language Models evaluation
D Kouremenos, K Ntalianis, S Kollias – Computer Speech & Language, 2018 – Elsevier
Skip to main content Skip to article …
Targeted syntactic evaluation of language models
R Marvin, T Linzen – arXiv preprint arXiv:1808.09031, 2018 – arxiv.org
… tated data (CCG supertags). N-gram model: We trained a 5-gram model on the same 90M word corpus using the SRILM toolkit (Stolcke, 2002) which backs off to smaller n-grams using Kneser-Ney smoothing. Single-task RNN …
A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition
ZR Wang, J Du, WC Wang, JF Zhai, JS Hu – International Journal on …, 2018 – Springer
… In this work, we adopt the Katz smoothing [42]. The SRILM toolkit [43] is employed to generate Katz N-gram LM with different orders. Without using N-gram (\(N=0\)), the recognition accuracy might be sharply declined as shown in the experiments …
Transfer Learning for British Sign Language Modelling
B Mocialov, H Hastie, G Turner – Proceedings of the Fifth Workshop on …, 2018 – aclweb.org
… As a result, the model trained on English language and applied to the BSL scored 1051.91 in perplexity using SRILM toolkit (Stolcke, 2002). Conversely, a model trained on the BSL has been applied to the English language and scored 1447.23 in perplexity …
Ameliorated language modelling for lecture speech recognition of Indian English
DK Phull, GB Kumar – S?dhan?, 2018 – Springer
… clustering to be 1000. The LM has been built using a variKN toolkit [26] considering the frequent 64k words for trigram LMs. The interpolation of the LM has been performed using a SRILM toolkit [27]. We have used WER (%), perplexity …
An Arabic Morphological Analyzer and Generator with Copious Features
D Taji, S Khalifa, O Obeid, F Eryani… – Proceedings of the …, 2018 – aclweb.org
… each stem entry in the database. The scores were generated from the train set (Diab et al., 2013) of the PATB. We used the SRILM toolkit (Stolcke, 2002) to generate the scores with no smoothing. In Section 6, we show that these …
Arabic Speech Recognition: Challenges and State of the Art
SM Abdou, AM Moussa – … Speech And Image Processing For Arabic …, 2018 – World Scientific
… One of effective tools for training language models is the SRILM toolkit that includes most of state of art alternatives.11 2.4. Decoding … 21 FLMs have been implemented as an add-on to the widely-used SRILM toolkit. Further details can be found in Ref. 43 …
Feature Optimization for Predicting Readability of Arabic L1 and L2
H Saddiki, N Habash, V Cavalli-Sforza… – Proceedings of the 5th …, 2018 – aclweb.org
… eg Fig. 1) in prepara- tion for feature extraction. Then, both raw text and annotations from the training set are used to build LMs for each of the 4 levels of readability (Table 3) with the SRILM toolkit (Stolcke et al., 2002). At this …
Reassessing the proper place of man and machine in translation: a pre-translation scenario
J Ive, A Max, F Yvon – Machine Translation, 2018 – Springer
… 2013). We trained a 6-gram language model with modified Kneser-Ney smoothing (Kneser and Ney 1995) on the French part of the MT training data using the SRILM toolkit (Stolcke 2002). The MT system was tuned with kb-mira and 300-best lists (Cherry and Foster 2012) …
An online English-Khmer hybrid machine translation system.
S Jabin, N Chatterjee, S Samak, K Sokphyrum, J Sola – IJISTA, 2018 – researchgate.net
… dictionary. The SRI language modelling (SRILM) toolkit has been used for training a 3-gram LM for experimentation. In the present work all the translated parallel corpus are based on Samdach Sangha Raja ChounNath’s dictionary …
Digital Automatic Speech Recognition using Kaldi
S Alyousefi – 2018 – repository.lib.fit.edu
Page 1. Digital Automatic Speech Recognition using Kaldi By Sarah Habeeb Alyousefi Bachelor of Science Computer and software Engineering Al-Mustansiriya University College of Engineering A thesis submitted to the College of Engineering at Florida Institute of Technology …
Building and Exploiting Domain-Specific Comparable Corpora for Statistical Machine Translation
R Sellami, F Sadat, LH Beluith – Intelligent Natural Language Processing …, 2018 – Springer
… Word alignment is done with GIZA++ [19]. We implemented a 5-gram language model using the SRILM toolkit [36]. We tokenized the Arabic side of the training, development and test data using the MADA + TOKAN morphological disambiguation system [26] …
Transcription of spanish historical handwritten documents with deep neural networks
E Granell, E Chammas, L Likforman-Sulem… – Journal of …, 2018 – mdpi.com
… corresponding symbol. Language Models (LM) were estimated as n-grams with Kneser–Ney back-off smoothing [26] by using the SRILM toolkit [27]. Different LMs were used in the experiments at word, sub-word and character levels. For …
Improvement in monaural speech separation using sparse non-negative tucker decomposition
YV Varshney, P Upadhyaya, ZA Abbasi… – International Journal of …, 2018 – Springer
… For better acoustic modelling, alignment is required after each ? + (? ? LDA-MLLT and SAT. 4.3 Language model. Any language model that has the FST representation can be used in Kaldi. Here, SRILM toolkit is used for building and applying statistical language models …
Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
J Su, Z Song, Y Lu, M Xu, C Wu, Y Chen – Neural Processing Letters, 2018 – Springer
… and test sets, respectively. Table 2 shows the statistics of the various data sets. We applied SRILM Toolkit 1 to train a 4-gram language model on the Xinhua portion of Gigaword corpus (306 million words). We chose MOSES 2 …
Language Modelling for Code-Switched Text
T Parekh – 2018 – saurabhgarg1996.github.io
Page 1. Language Modelling for Code-Switched Text Bachelors Thesis Project Bachelors of Technology in Computer Science and Engineering by Tanmay Parekh (140100011) in co-ordination with Saurabh Garg (140070003) under the guidance of Prof. Preethi Jyothi …
Advanced Quality Measures for Speech Translation
NT Le – 2018 – tel.archives-ouvertes.fr
Page 1. HAL Id: tel-01891892 https://tel.archives-ouvertes.fr/tel-01891892 Submitted on 10 Oct 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …
Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model
S Kuang, D Xiong – arXiv preprint arXiv:1806.04466, 2018 – arxiv.org
… We trained a 5-gram language model on the Xinhua portion of the Gigaword corpus using SRILM Toolkit with a modified Kneser-Ney smoothing. For RNNSearch, we used the parallel corpus to train the attention-based NMT model …
Affordance-based multi-contact whole-body pose sequence planning for humanoid robots in unknown environments
P Kaiser, C Mandery, A Boltres… – 2018 IEEE International …, 2018 – ieeexplore.ieee.org
… set of representative training motions. Training the n-gram model is based on textual representations of the observed motions as configuration pose sequences, facilitated using the SRILM Toolkit [29]. In addition, we are learning …
Lattice-to-sequence attentional Neural Machine Translation models
Z Tan, J Su, B Wang, Y Chen, X Shi – Neurocomputing, 2018 – Elsevier
Skip to main content …
Parallel Corpora for bi-lingual English-Ethiopian Languages Statistical Machine Translation
ST Abate, M Melese, MY Tachbelie… – Proceedings of the 27th …, 2018 – aclweb.org
… 2003) for aligning words and phrases. SRILM toolkit was used to develop language models using semi-automatically prepared corpora from the training and tuning corpora of target languages. Table 7 shows the sentence length …
Neural Speech Translation at AppTek
E Matusov, P Wilken, P Bahar… – International …, 2018 – workshop2018.iwslt.org
Page 119. Neural Speech Translation at AppTek Evgeny Matusov, Patrick Wilken, Parnia Bahar?, Julian Schamper?, Pavel Golik, Albert Zeyer?, Joan Albert Silvestre-Cerda+, Adria Mart?nez-Villaronga+, Hendrik Pesch, and …
Arabic corpus linguistics: major progress, but still a long way to go
I Zeroual, A Lakhouaja – Intelligent Natural Language Processing: Trends …, 2018 – Springer
… The corpus contains over 6,000 texts, totalling around 1 billion words, of which 800 million words are from dated texts and the rest parts are automatically dated by building a 5-gram language model with Kneser-Ney smoothing, using the SRILM toolkit (Stolcke et al. 2011) …
ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation
J Tejedor, DT Toledano, P Lopez-Otero… – EURASIP Journal on …, 2018 – biomedcentral.com
Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems submitted to the ALBAYZIN QbE STD 2016 Evaluation held as a part …
Syntax-Based Context Representation for Statistical Machine Translation
K Chen, T Zhao, M Yang – IEICE TRANSACTIONS on Information …, 2018 – search.ieice.org
Page 1. 3226 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.12 DECEMBER 2018 PAPER Syntax-Based Context Representation for Statistical Machine Translation Kehai CHEN † , Student Member, Tiejun ZHAO †a) , Nonmember, and Muyun YANG † , Member …
Improvements in Serbian speech recognition using sequence-trained deep neural networks
E Pakoci, B Popovi?, DJ Pekar – ????? ???????, 2018 – mathnet.ru
… The Kneser-Ney smoothing method [16] with a pruning value of 10-7 was applied to obtain the previously mentioned numbers, as it was proven to be optimal [3]. The language model was trained using the SRILM toolkit [17] …
Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs
O Koller, S Zargaran, H Ney, R Bowden – International Journal of …, 2018 – Springer
… In the following experiments the prior-scaling-factor \(\beta \) is set to 0.3 if not stated otherwise. The LM is estimated as n-gram using the SRILM toolkit by Stolcke (2002). The HMM is employed in bakis structure (Bakis 1976) …
Semi-supervised acoustic model training for speech with code-switching
E Y?lmaz, M McLaren, H van den Heuvel… – Speech …, 2018 – Elsevier
Skip to main content …
Optimizing Automatic Evaluation of Machine Translation with the ListMLE Approach
M Li, M Wang – ACM Transactions on Asian and Low-Resource …, 2018 – dl.acm.org
… target side of the training corpus for machine translation with human references to form the monolingual training data, where we train a 4-gram language model based on the data, and compute the language model probability of the translation output using the SRILM toolkit [31] …
Understanding Reading Attention Distribution during Relevance Judgement
X Li, Y Liu, J Mao, Z He, M Zhang, S Ma – Proceedings of the 27th ACM …, 2018 – dl.acm.org
… We applied surprisal, which is the negative log-likelihood of a word in the context, to describe how unfamiliar a text is to users. By using the SRILM Toolkit [39], we built a bi-gram language model based on a large-scale online news data [40] …
A preordering model based on phrasal dependency tree
S Farzi, H Faili, S Kianian – Digital Scholarship in the …, 2018 – academic.oup.com
Abstract. Intelligent machine translation (MT) is becoming an important field of research and development as the need for translations grows. Currently, the wo.
Alignment-consistent recursive neural networks for bilingual phrase embeddings
J Su, B Zhang, D Xiong, Y Liu, M Zhang – Knowledge-Based Systems, 2018 – Elsevier
Skip to main content Skip to article …
Turkish speech recognition
E Ar?soy, M Saraçlar – Turkish Natural Language Processing, 2018 – Springer
… A vocabulary size of 50K words, yielding 11.8% OOV rate, was used to train a 3-gram word-based language model. The language model was built using the SRILM toolkit (Stolcke 2002) with interpolated Kneser-Ney smoothing …
Towards automatic assessment of spontaneous spoken English
Y Wang, MJF Gales, KM Knill, K Kyriakopoulos… – Speech …, 2018 – Elsevier
… This combined acoustic score is then used in Viterbi decoding. A Kneser-Ney trigram LM is trained on 186K words of BULATS test data and interpolated with a general English LM trained on a large broadcast news corpus, using the SRILM toolkit (Stolcke, 2002) …
Semantics in Shallow Models
O Bojar, R Sennrich, P Williams, I Skadi?a, D Deksne – 2018 – qt21.eu
Page 1. This document is part of the Research and Innovation Action “Quality Translation 21 (QT21)”. This project has received funding from the European Union’s Horizon 2020 program for ICT under grant agreement no. 645452. Deliverable D1.4 Semantics in Shallow Models …
A survey of diacritic restoration in abjad and alphabet writing systems
F? ASAHIAH, ?À ?DÉ?J?BÍ… – Natural Language …, 2018 – cambridge.org
Page 1. Natural Language Engineering 24 (1): 123–154. c Cambridge University Press 2017 doi:10.1017/S1351324917000407 123 A survey of diacritic restoration in abjad and alphabet writing systems FRANKLIN O. L ´ADIÍP …
Morpheme-Based Bi-Directional Ge’ez-Amharic Machine Translation
T Kassa – 2018 – 213.55.95.56
Page 1. Addis Ababa University College of Natural and Computational Sciences School of Information Science Morpheme-Based Bi-directional Ge’ez -Amharic Machine Translation A Thesis Submitted in Partial Fulfillment of the Requirement for the …
Automatic language identification in texts: A survey
T Jauhiainen, M Lui, M Zampieri, T Baldwin… – arXiv preprint arXiv …, 2018 – arxiv.org
Page 1. arXiv:1804.08186v2 [cs.CL] 21 Nov 2018 Journal of Artificial Intelligence Research () Submitted 10/2018; published – Under Review Automatic Language Identification in Texts: A Survey Tommi Jauhiainen tommi.jauhiainen@helsinki.fi …
Multimodality, interactivity, and crowdsourcing for document transcription
E Granell, V Romero… – Computational …, 2018 – Wiley Online Library
Skip to Main Content …
Robust Spoken Term Detection using partial search and re-scoring hypothesized detections techniques
P VAN TUNG – 2018 – ntu.edu.sg
Page 1. Robust Spoken Term Detection using partial search and re-scoring hypothesized detections techniques A thesis submitted to the School of Computer Science and Engineering of the Nanyang Technological University by PHAM VAN TUNG …
Machine Translation of Arabic Dialects
WS Salloum – 2018 – academiccommons.columbia.edu
Page 1. Machine Translation of Arabic Dialects Wael Salloum Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2018 Page 2. © 2018 Wael Salloum All rights reserved …
Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition
A Coucheiro-Limeres, J Ferreiros-Lopez… – Expert Systems with …, 2018 – Elsevier
Skip to main content …
Syntactic and semantic features for statistical and neural machine translation
M N?dejde – 2018 – era.lib.ed.ac.uk
Page 1. This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (eg PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work …
Neural Networks for Language Modeling and Related Tasks in Low-Resourced Domains and Languages
O TILK – 2018 – digi.lib.ttu.ee
Page 1. TALLINN UNIVERSITY OF TECHNOLOGY DOCTORAL THESIS 55/2018 Neural Networks for Language Modeling and Related Tasks in Low-Resourced Domains and Languages OTTOKAR TILK Page 2. TALLINN UNIVERSITY …
Neural Creative Language Generation
M Ghazvininejad – 2018 – search.proquest.com
Page 1. NEURAL CREATIVE LANGUAGE GENERATION by Marjan Ghazvininejad A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree …
Towards effective cross-lingual search of user-generated internet speech
A Khwileh – 2018 – doras.dcu.ie
Page 1. Towards Effective Cross-Lingual Search of User-Generated Internet Speech Ahmad Khwileh B.Tech., M.Tech. A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.) to the Dublin City University School of Computing …
Driver Behavior and Environment Interaction Modeling for Intelligent Vehicle Advancements
Y Zheng – 2018 – utd-ir.tdl.org
Page 1. DRIVER BEHAVIOR AND ENVIRONMENT INTERACTION MODELING FOR INTELLIGENT VEHICLE ADVANCEMENTS by Yang Zheng APPROVED BY SUPERVISORY COMMITTEE: _____ Dr. John HL Hansen, Chair …
Understanding stories via event sequence modeling
H Peng – 2018 – ideals.illinois.edu
Page 1. c 2018 Haoruo Peng Page 2. UNDERSTANDING STORIES VIA EVENT SEQUENCE MODELING BY HAORUO PENG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy …
Compressive Cross-Language Text Summarization
EL Pontes – 2018 – hal.archives-ouvertes.fr
Page 1. HAL Id: tel-02003886 https://hal.archives-ouvertes.fr/tel-02003886 Submitted on 1 Feb 2019 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …