IRSTLM (IRST Language Modeling) Toolkit 2013

IRST LM Toolkit

Notes:

Formerly Istituto per la Ricerca Scientifica e Tecnologica (IRST), now Bruno Kessler Foundation (FBK), sometimes referred to as FBK-IRST (FBK IRSTLM Toolkit).

Resources:

Wikipedia:

See also:

BerkeleyLMEGYPT Statistical Machine Translation ToolkitIRSTLM Wiki | Kaldi Speech Recognition ToolkitKenLM: Language Model InferenceMIT Language Modeling ToolkitOpenMaTrEx Machine Translation SystemRandLM

Language Modeling & Dialog Systems 2011 | Maxent (Maximum Entropy Modeling Toolkit) 2011 | Rule-based Language ModelingSRILM (SRI Language Modeling Toolkit) 2011


Scalable Modified Kneser-Ney Language Model Estimation. K Heafield, I Pouzyrevsky, JH Clark, P Koehn – ACL (2), 2013 – aclweb.org … IRSTLM (Federico et al., 2008) does not imple- ment modified Kneser-Ney but rather an approxi- mation dubbed “improved Kneser-Ney” (or “mod- ified shift-beta” depending on the version). … SRILM and IRSTLM were run un- til the test machine ran out of RAM (64 GB). … Cited by 26 Related articles All 9 versions Cite Save More

Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners. K Sakaguchi, Y Arase, M Komachi – ACL (2), 2013 – aclweb.org … Table 3: Ratio of appropriate distractors (RAD) with a 95% confidence interval and inter-rater agreement statistics ?. model score trained on Google 1T Web Corpus (Brants and Franz, 2006) with IRSTLM toolkit12. … net/projects/irstlm/files/irstlm/ … Related articles All 2 versions Cite Save More

Automatic Transcription of Polish Radio and Television Broadcast Audio D Koržinek, K Marasek, ? Brocki – Intelligent Tools for Building a Scientific …, 2013 – Springer … Page 6. 494 D. Koržinek, K. Marasek, and ?. Brocki Table 1. Experiment results comparing our system to the Julius baseline using models from IRSTLM on a 30k and 60k vocabulary. … IRSTLM 30k 44 52 289 611 2424 832 3605 IRSTLM 60k 47 56 276 595 2512 544 3812 … Related articles All 2 versions Cite Save

Phrase-Based Machine Translation of Under-Resourced Languages A Drummer – people.cs.uct.ac.za … The Moses toolkit was used along with Giza++ for alignment and IRSTLM for the language model. The researcher was unsuccessful in … in the training pipeline. Moses requires one of the following language modelling toolkits; IRSTLM[2], SRILM[5], RandLM, or KenLM. … Related articles All 2 versions Cite Save More

Statistical sentiment analysis performance in Opinum B Bonev, G Ramírez-Sánchez, SO Rojas – arXiv preprint arXiv:1303.0446, 2013 – arxiv.org … In our setup we use the IRSTLM open-source library for building the language model. … For Opinum we run IRSTLM twice during the training phase: once taking as input the opinions labeled as positive and once taking the negatives: Mp ? Irstlm(L (Op)) Mn ? Irstlm(L (On)) … Cited by 1 Related articles All 2 versions Cite Save

FBK’s Machine Translation Systems for the IWSLT 2013 Evaluation Campaign N Bertoldi, MA Farajian, P Mathur, N Ruiz, M Federico… – hlt.fbk.eu … 2.4.1. Mixture Monolingual subcorpora can be combined into one mixture language model [11] by means of the IRSTLM toolkit [12]. … This technique, provided by the IRSTLM toolkit, consists in the linear interpolation of the n-gram probabilities from all component LMs. … Related articles All 3 versions Cite Save More

Optimal translation of English to Bahasa Indonesia using statistical machine translation system T Mantoro, J Asian, R Octavian… – … Technology for the …, 2013 – ieeexplore.ieee.org … In order to produce an optimal translation, our previous work [2] studies four different Istituto per la Ricerca Scientifica e Tecnologica Language Modelling (IRSTLM) parameters, namely n-gram, smoothing, alignment and reordering from English to BI. … Related articles Cite Save

The CNGL-DCU-Prompsit translation systems for WMT13 R Rubino, A Toral, SC Vaíllo, J Xie, X Wu… – Proceedings of the …, 2013 – aclweb.org … Individual language models (LMs), 5-gram and smoothed using a simplified version of the im- proved Kneser-Ney method (Chen and Goodman, 1996), are built for each monolingual corpus using IRSTLM 5.80. 01 (Federico et al., 2008). … Cited by 7 Related articles All 9 versions Cite Save More

Project adaptation for mt-enhanced computer assisted translation M Cettolo, N Bertoldi, M Federico… – Proceedings of the MT …, 2013 – mtsummit2013.info … We reimplemented the data selection technique by Moore at Lewis (2010) and made it publicly available through the IRSTLM toolkit (Federico et al., 2008). … The method is available in the IRSTLM toolkit (Fed- erico et al., 2008). 28 Page 3. 3 Data for Development … Cited by 3 Related articles All 2 versions Cite Save More

Statistical machine translation for automobile marketing texts S Läubli, M Fishel, M Weibel… – Proceedings of MT Summit …, 2013 – mtsummit2013.info … The only difference is that we used IRSTLM for language modeling (Fed- erico et al., 2008) because of licensing issues. … Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. IRSTLM: an open source toolkit for handling large scale language models. … Cited by 1 Related articles Cite Save More

Rule Based Transliteration Scheme for English to Punjabi D Bhalla, N Joshi, I Mathur – arXiv preprint arXiv:1307.4300, 2013 – arxiv.org … Table 2 demonstrates the English-Punjabi parallel corpus that we have used for performing our experiment. In this we have trained our language model using IRSTLM toolkit [9]. Our transliteration system follows the steps which are represented in figure 1. Example: … Cited by 2 Related articles All 5 versions Cite Save

Modernizing historical Slovene words with character-based SMT Y Scherrer, E Tomaž – BSNLP 2013, 2013 – halshs.archives-ouvertes.fr … The experiments have been carried out with the tools of the standard SMT pipeline: GIZA++ (Och and Ney, 2003) for alignment, Moses (Koehn et al., 2007) for phrase extraction and decoding, and IRSTLM (Federico et al., 2008) for language mod- elling. … Cited by 4 Related articles All 12 versions Cite Save

Large-scale multiple language translation accelerator at the United Nations B Pouliquen, C Elizalde, M Junczys-Dowmunt… – mtsummit2013.info … Language models are being computed with the IRSTLM toolkit (Federico et al., 2008). We use 5-gram language models but we first prune the model by setting a threshold that discards half of the less significant 5-grams and then we apply the prune-lm tool provided by IRSTLM. … Related articles All 2 versions Cite Save More

Applying Pairwise Ranked Optimisation to Improve the Interpolation of Translation Models. B Haddow – HLT-NAACL, 2013 – aclweb.org … Schroeder, 2007)), and is implemented in popular language modelling tools like IRSTLM (Federico et al., 2008) and SRILM (Stolcke, 2002). … 2008. IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. In Proceedings of In- terspeech, Brisbane, Australie. … Cited by 2 Related articles All 6 versions Cite Save More

Graph Model for Chinese Spell Checking Z Jia, P Wang, H Zhao – Sixth International Joint Conference on Natural …, 2013 – aclweb.org … For similar character map the data set provided by (Liu et al., 2011) is used. The LM is built on the Academia Sinica corpus (Emerson, 2005) with IRSTLM toolkit (Federico et al., 2008). … 2008. Irstlm: an open source toolkit for handling large scale language models. … Related articles All 4 versions Cite Save More

Issues in incremental adaptation of statistical mt from human post-edits M Cettolo, C Servan, N Bertoldi, M Federico… – Proceedings of the MT …, 2013 – matecat.com … The method is available in the IRSTLM toolkit (Fed- erico et al., 2008). 4 Field Test … 4http://nlp.cs.nyu.edu/GTM nique (Chen and Goodman, 1999) are estimated on the target side via the IRSTLM toolkit (Federico et al., 2008). … Cited by 3 Related articles All 5 versions Cite Save More

Building a reordering system using tree-to-string hierarchical model J Dlougach, I Galinskaya – arXiv preprint arXiv:1302.3057, 2013 – arxiv.org … 2.4.3 Language model We have decided to build a simple 3-gram language model based on the target sentences as a corpus using IRSTLM toolkit (Federico, Bertoldi and Cettolo, 2008). … IRSTLM: an open source toolkit for handling large scale language models. … Related articles All 10 versions Cite Save

Edit Distance: A New Data Selection Criterion for Domain Adaptation in SMT. L Wang, DF Wong, LS Chao, J Xing, Y Lu, I Trancoso – RANLP, 2013 – aclweb.org … A 5-gram language model was trained on the target side of the training par- allel corpus using the IRSTLM toolkit (Federico et al., 2008), exploiting improved Modified Kneser-Ney smoothing, and quantizing both probabilities and back-off weights. 3.3 Baseline System … Related articles All 2 versions Cite Save More

An Improved Patent Machine Translation System Using Adaptive Enhancement for NTCIR-10 PatentMT Task H Zhao, J Zhang, M Utiyama, E Sumita – Proceedings of NTCIR, 2013 – research.nii.ac.jp … For the rest three test datasets, we submitted translation results by hierar- chical phrase-based SMT system since usually hierarchical 1http://hlt.fbk.eu/en/irstlm Proceedings of the 10th NTCIR Conference, June 18-21, 2013, Tokyo, Japan 376 Page 2. … Cited by 1 Related articles Cite Save More

NAIST at 2013 CoNLL grammatical error correction shared task I Yoshimoto, T Kose, K Mitsuzawa, K Sakaguchi… – CoNLL-2013, 2013 – aclweb.org … 515 as the alignment tool. The grow-diag-final heuristics was applied for phrase extraction. As a language mod- eling tool we used IRSTLM version 5.8016 with Witten-Bell smoothing. … com/p/giza-pp/ 16http://sourceforge. net/projects/ irstlm/ 17consisting of entries through 2012. … Cited by 6 Related articles All 9 versions Cite Save More

Bologna Translation Service: Improving Access To Educational Courses Via Automatic Machine Translation J Pietrzak, E Garcia, A Jauregi – Procesamiento del Lenguaje …, 2013 – journal.sepln.org … GIZA++ (Och and Ney, 2003) was used for word-aligment with the default number of iterations for the implementations of IBM Models. To build the language models (LM), we used the state-of-the-art open-source IRSTLM toolkit (Federico and Cettolo, 2007). The LMs … Related articles All 4 versions Cite Save

Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries L Bungum, B Gambäck, A Lynum, E Marsi – NAACL HLT 2013, 2013 – aclweb.org … performance. The n-gram models were built using the IRSTLM toolkit (Federico et al., 2008; Bungum and Gambäck, 2012) on the DeWaC corpus (Baroni and Kilgarriff, 2006), using the stopword list from NLTK (Loper and Bird, 2002). … Cited by 1 Related articles All 9 versions Cite Save More

An Online Service for SUbtitling by MAchine G van Loenhout, A Walker, Y Georgakopoulou… – 2013 – sumat-project.eu … model building plus decoding. To build the language models we have used the state-of-the-art open-source IRSTLM toolkit [Federico & Cettolo, 2007]. The development of the SMT systems has been incremental. A number of … Related articles Cite Save More

Omnifluent English-to-French and Russian-to-English systems for the 2013 Workshop on Statistical Machine Translation E Matusov, G Leusch – Proceedings of the Eighth Workshop on Statistical …, 2013 – aclweb.org … LMs were estimated and pruned using the IRSTLM toolkit (Federico et al., 2008). … Marcello Federico, Nicola Bertoldi, and Mauro Cet- tolo. 2008. IRSTLM: an open source toolkit for handling large scale language models. In Proceed- ings of Interspeech, pages 1618–1621. … Cited by 3 Related articles All 8 versions Cite Save More

Tools for development and deployment of linguistic resources for HTR J Tanha, V Romero, J de Does – 2013 – transcriptorium.eu … Allow the user to set this probability (IRSTLM)4. – Use adapted perplexity, (APP … 4http://sourceforge. net/apps/mediawiki/irstlm/index.php?title=User_Manual: Language models have to cope with out-of-vocabulary words, that is internally represented with the word class _unk_ . … Related articles Cite Save More

Implicitation of discourse connectives in (machine) translation T Meyer, B Webber – Proceedings of the 1st DiscoMT Workshop …, 2013 – infoscience.epfl.ch … The 3-gram language model was built with IRSTLM (Federico et al., 2008) over Europarl and the rest of WMT’s news data for FR and DE. 3.3 Results … Marcello Federico, Nicola Bertoldi, and Mauro Cet- tolo. 2008. IRSTLM: an Open Source Toolkit for Page 7. … Cited by 2 Related articles All 9 versions Cite Save

Improving Language Model Adaptation using Automatic Data Selection and Neural Network. S Jalalvand – RANLP, 2013 – aclweb.org … On this data we trained a 4-gram back-off LM using the modified shift beta smoothing method as supplied by the IRSTLM toolkit (Federico, 2008). … 2008. IRSTLM: an Open Source Toolkit for Handling Large Scale Language Model. in Proc. … Related articles All 2 versions Cite Save More

The FBK ASR system for Evalita 2011 R Ronny, A Shakoor, F Brugnara, R Gretter – Evaluation of Natural …, 2013 – Springer … The LM was estimated using the IRSTLM toolkit [12] on the provided corpus of 32M running words, applying Kneser-Ney smoothing. … 825–828 (1991) 12. Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. … Cited by 2 Related articles All 4 versions Cite Save

Arabizi detection and conversion to Arabic K Darwish – arXiv preprint arXiv:1306.6755, 2013 – arxiv.org … We built a trigram language model using the IRSTLM lan- guage modeling toolkit (Federico et al., 2008). The advantage of this language model was that it con- tained both MSA and dialectal text. … 2008. IRSTLM: an open source toolkit for handling large scale language models. … Cited by 4 Related articles All 2 versions Cite Save

Online learning approaches in computer assisted translation P Mathur, M Cettolo, M Federico… – Proceedings of the Eighth …, 2013 – aclweb.org … We created a 5-gram LM for TED talks and a 6-gram LM for the IT domain using IRSTLM (Federico et al., 2008) with im- proved Kneser-Ney smoothing (Chen and Good- man, 1996) on the target side of the training paral- lel corpora. … Cited by 5 Related articles All 9 versions Cite Save More

Simple, readable sub-sentences. S Klerke, A Søgaard – ACL (Student Research Workshop), 2013 – aclweb.org … dsl. dk/korpus2000/ engelsk_hovedside 2The LM was a 5-gram Knesser-Ney smoothed lowercase model, built using IRSTLM (Federico et al., 2008) 4.2 Experimental setup Three system variants were set up to generate simplified output from the original news wire of the … Cited by 1 Related articles All 5 versions Cite Save More

Generative and discriminative methods for online adaptation in smt K Wäschle, P Simianer, N Bertoldi… – Proceedings of the …, 2013 – wiki.cl.uni-heidelberg.de … setting. The global 5-gram LM smoothed through the improved Kneser-Ney technique is estimated on the target monolingual side of the parallel train- ing data using the IRSTLM toolkit (Federico et al., 2008). Models are case-sensitive. … 2008. IRSTLM: an Open Source Toolkit for … Cited by 5 Related articles All 16 versions Cite Save More

Parameter Optimization for Iterative Confusion Network Decoding in Weather-Domain Speech Recognition S Jalalvand, D Falavigna – eu-bridge.eu … Experiments By using the IRSTLM toolkit [13], we train a Bi-gram and a 4- gram back-off, modified shift beta smoothed language models on the domain-related set (100MW) and we used them in the ASR decoder for generating two different sets of word graphs (one with Bi-gram … Related articles All 2 versions Cite Save More

iCPE: A Hybrid Data Selection Model for SMT Domain Adaptation L Wang, DF Wong, LS Chao, Y Lu, J Xing – … Computational Linguistics and …, 2013 – Springer … Page 7. 286 L. Wang et al. built using GIZA++ [26] and the training script of Moses. A 5-gram language model was trained using the IRSTLM toolkit [27], exploiting improved Modified Kneser- Ney smoothing, and quantizing both, probabilities and back-off weights. 4.3 Baseline … Cited by 2 Related articles All 2 versions Cite Save

Model for English-Urdu Statistical Machine Translation A Ali, A Hussain, MK Malik – World Applied Sciences Journal, 2013 – idosi.org … The model is trained on TrainSet using Moses Conclusion and Future Work: There are certain words in translation setup with language modeling toolkit IRSTLM. the generated translations which are not at all converted Decoding of this model on TestSet gives the BLEU score … All 3 versions Cite Save More

Topic models for translation quality estimation for gisting purposes R Rubino, J de Souza, J Foster, L Specia – 2013 – doras.dcu.ie … and SYSTRAN. The MOSES system is a stan- dard phrase-based SMT system trained using the Moses (Koehn et al., 2007) and IRSTLM (Federico et al., 2008) toolkits and optimised on a devel- opment set against BLEU (Papineni et al., 2002) using MERT (Och, 2003). … Cited by 2 Related articles All 2 versions Cite Save

Cache-based online adaptation for machine translation enhanced computer assisted translation N Bertoldi, M Cettolo, M Federico – Proc. of MT Summit, Nice, …, 2013 – mtsummit2013.info … trained on the par- allel training data; 6-gram (for IT and Legal sys- tems) and 5-gram (for TED systems) LMs with im- proved Kneser-Ney smoothing (Chen and Good- man, 1999) were estimated on the target side of the training parallel data with the IRSTLM toolkit (Federico et al … Cited by 7 Related articles All 2 versions Cite Save More

Quality Estimation-guided Data Selection for Domain Adaptation of SMT P Banerjee, R Rubino, J Roturier… – MT Summit XIV: …, 2013 – mtsummit2013.info … All the LMs in our ex- periments are created using the IRSTLM (Federico et al., 2008) language modelling toolkit. … 2008. IRSTLM: an open source toolkit for handling large scale language models. In Interspeech 2008, pages 1618–1621, Brisbane, Australia. … Cited by 3 Related articles All 2 versions Cite Save More

Constrained grammatical error correction using Statistical Machine Translation Z Yuan, M Felice – CoNLL-2013, 2013 – aclweb.org … systems. 3.3 Tools All our systems were built using the Moses SMT system (Koehn et al., 2007), together with Giza++ (Och and Ney, 2003) for word alignment and the IRSTLM Toolkit (Federico et al., 2008) for lan- guage modelling. … Cited by 2 Related articles All 9 versions Cite Save More

Community-based post-editing of machine-translated content: monolingual vs. bilingual L Mitchell, J Roturier, S O’Brien – Machine Translation Summit XIV – accept.unige.ch … the available monolingual English forum data (approx. a million sentences). It was trained using the IRSTLM (Federico et al., 2008) language mod- elling toolkit. To automatically achieve this, an unsupervised clustering approach … Related articles All 4 versions Cite Save More

Identifying multilingual Wikipedia articles based on cross language similarity and activity KN Tran, P Christen – Proceedings of the 22nd ACM international …, 2013 – dl.acm.org … Due to space limitations, we present only results for the Cosine similar- ity with a TF-IDF modification. We cannot use the BLEU score as a similarity measure because the articles in English 2http://hlt.fbk.eu/en/irstlm 3http://radimrehurek.com/gensim/ 1486 Page 3. … Related articles Cite Save

Sentence simplification as tree transduction D Feblowitz, D Kauchak – Proc. of the Second Workshop on Predicting …, 2013 – aclweb.org … The probability of the output tree’s yield, as given by an n-gram language model trained on the simple side of the training corpus using the IRSTLM Toolkit (Federico et al., 2008). … 2008. IRSTLM: An open source toolkit for handling large scale language models. … Cited by 2 Related articles All 2 versions Cite Save More

An English-to-Hungarian Morpheme-based Statistical Machine Translation System with Reordering Rules LJ Laki, A Novák, B Siklósi – ACL 2013, 2013 – aclweb.org … task. In all of our experiments, the Moses (Koehn et al., 2007) toolkit was used for building the trans- lation models and performing the translation task itself, using IRSTLM (Federico et al., 2008) to build language models. Wherever … Cited by 1 Related articles All 5 versions Cite Save More

Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources Y Scherrer, B Sagot – RANLP Workshop on Adaptation of language …, 2013 – hal.inria.fr … 4.1.2 Training of the C-SMT model Our C-SMT model relies on the standard pipeline consisting of GIZA++ (Och and Ney, 2003) for character alignment, IRSTLM (Federico et al., 2008) for language modelling, and Moses (Koehn et al., 2007) for phrase extraction and decoding. … Cited by 1 Related articles All 9 versions Cite Save

Translating video content to natural language descriptions M Rohrbach, W Qiu, I Titov, S Thater… – … Vision (ICCV), 2013 …, 2013 – ieeexplore.ieee.org … probability. Additionally a reordering model is learned based on the training data alignment statistics [13]. To estimate the fluency of the descriptions we use IRSTLM [6] which is based on n-gram statistics of TACoS. The final … Cited by 3 Related articles All 9 versions Cite Save

System Description of BJTU-NLP MT for NTCIR-10 PatentMT P Wu, J Xu, Y Yin, Y Zhang – Proceedings of NTCIR, 2013 – research.nii.ac.jp … Computational Linguistics, 29(1):19–52. [7] Marcello Federico, Nicola Bertoldi, Mauro Cettolo. 2008 IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. In Proceedings of Interspeech 2008, 1618-1621. … Cited by 1 Related articles Cite Save More

EU-BRIDGE MT: Text Translation of Talks in the EU-BRIDGE Project M Freitag, S Peitz, J Wuebker, H Ney, N Durrani… – Proc. of IWSLT, 2013 – eu-bridge.eu … In order to focus it on TED spe- cific domain and genre, and to reduce the size of the system, data selection by means of IRSTLM toolkit [57] was per- formed on the whole parallel English?French corpus, using the WIT3 training data as in-domain data. … Cited by 2 Related articles All 7 versions Cite Save More

Topic dependent cross-word Spelling Corrections for Web Sentiment Analysis SA Jadhav, DVLN Somayajulu… – Advances in …, 2013 – ieeexplore.ieee.org … http://twitter4j.org/en/ 3 https://github.com/moses-smt/mosesdecoder 4 GIZA++ is a statical machine translation toolkit http://code.google.com/p/giza-pp/ 5 The IRST Language Modeling Toolkit http://hlt.fbk.eu/en/irstlm trained SCLTM is built. … Related articles Cite Save

Context Dependent Bag of words generation SA Jadhav, DVLN Somayajulu… – Advances in …, 2013 – ieeexplore.ieee.org … ????????????????????????????????????????? 2 http://onlineslangdictionary.com/ 3 https://github.com/moses-smt/mosesdecoder 4 The IRST Language Modeling Toolkit http://hlt.fbk.eu/en/irstlm 5 GIZA++ is a statical machine … Related articles Cite Save

Report on the 10th iwslt evaluation campaign M Cettolo, J Niehues, S Stüker… – Proc. of the …, 2013 – workshop2013.iwslt.org … Translation and lexicalized reordering models were trained on the parallel training data; 5-gram LMs with im- proved Kneser-Ney smoothing were estimated on the target side of the training parallel data with the IRSTLM toolkit [42]. … Cited by 10 Related articles All 4 versions Cite Save More

Efficient Language Modeling Algorithms with Applications to Statistical Machine Translation K Heafield – 2013 – kheafield.com … Table 1.1: Adoption of language model toolkits including this work, SRILM (Stolcke, 2002), IRSTLM (Federico et al., 2008), and BerkeleyLM (Pauls and Klein, 2011) by participants in the translation task of the 2013 Workshop on Machine Translation (Bojar et al., 2013b). … Related articles All 6 versions Cite Save More

Efficient solutions for word reordering in German-English phrase-based statistical machine translation A Bisazza, M Federico – 8th Workshop on Statistical Machine Translation, 2013 – aclweb.org … ley Aligner (Liang et al., 2006). The target lan- guage model is estimated by the IRSTLM toolkit (Federico et al., 2008) with modified Kneser-Ney smoothing (Chen and Goodman, 1999). The phrase-based baseline decoder includes … Cited by 1 Related articles All 11 versions Cite Save More

How hard is it to automatically translate phrasal verbs from English to French? BP LIG-GETALP – MULTI-WORD UNITS IN MACHINE …, 2013 – mtsummit2013.info … 4Described in more detail on the Moses online docu- mentation, at http://www. statmt. org/moses/? n= Moses. Baseline. ing the grow-diag-final heuristic. Language mod- els were estimated from the French part of the par- allel training corpus using 5-grams with IRSTLM. … Related articles All 3 versions Cite Save More

Are ACT’s scores increasing with better translation quality? N Hajlaoui – Are ACT” s scores increasing with better translation …, 2013 – infoscience.epfl.ch … Marcello Federico, Nicola Bertoldi, and Mauro Cet- tolo. 2008. IRSTLM: an open source toolkit for handling large scale language models. In Proceed- ings of Interspeech, Brisbane, Australia. Najeh Hajlaoui and Andrei Popescu-Belis. 2012. … Cited by 1 Related articles All 10 versions Cite Save

Statistical Machine Translation Model for English to Urdu Machine Translation RB Mishra – Artificial Intelligence and Soft Computing – researchgate.net … translations. Modified Kneser-Ney discounting is used as smoothing scheme for training 5-gram language model. There are more open source statistical language modeling toolkits available like IRSTLM, RandLM and KenLM. D … Related articles Cite Save More

Joint space neural probabilistic language model for statistical machine translation T Okita – arXiv preprint arXiv:1301.3614, 2013 – arxiv.org … 4 Intrinsic Evaluation We compared the perplexity of ngram-HMM LM (1 feature), ngram-HMM LM (2 features, the same as in this paper and genre ID is 4 class), modified Kneser-Ney smoothing (irstlm) [18], and hierar- chical Pitman Yor LM [48]. … Cited by 2 Related articles All 2 versions Cite Save

Unsupervised and Semi-supervised Myanmar Word Segmentation Approaches for Statistical Machine Translation YK Thu, A Finch, E Sumita, Y Sagisaka – saki.siit.tu.ac.th … blog data[16]. Standard Phrase Based Sta- tistical Machine Translation (PBSMT) sys- tems were trained using GIZA++ [17] for alignment, language modeling was done us- ing IRSTLM version 5.80.01 [18]. Minimum error rate … Related articles Cite Save More

Translate gene sequence into gene ontology terms based on statistical machine translation W Liang, ZK Yong – F1000Research, 2013 – f1000research.com … http://www.statmt.org/moses/?n=Development.GetStarted. (2) You’d better run the baseline of Moses. http://www.statmt.org/moses/?n=Moses.Baseline. (3) Python 2.7. Here we use irstlm to train language model, as this baseline suggest. 2.2 Parallel corpus for train: … Related articles All 2 versions Cite Save More

Improving function word alignment with frequency and syntactic information J Zhang, H Zhao – Proceedings of the Twenty-Third international joint …, 2013 – dl.acm.org … 1http://www.statmt.org/wmt11/ 2http://hlt.fbk.eu/en/irstlm 3http://nlp.stanford.edu/software/lex-parser. shtml Language pair Threshold Strategy CE 0.1 Strategy 2 DE 0.01 Strategy 1 FE 0.02 Strategy 2 Table 3: Parameter settings for translation tasks on different language pairs. … Related articles All 3 versions Cite Save

Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation. A Bisazza, M Federico – TACL, 2013 – transacl.org … As proposed by Johnson et al. (2007), statistically improbable phrase pairs are removed from the translation model. The lan- guage models are estimated by the IRSTLM toolkit (Federico et al., 2008) with modified Kneser-Ney smoothing (Chen and Goodman, 1999). … Cited by 1 Related articles All 8 versions Cite Save More

First report on user-adaptive MT M Federico, M Cettolo, N Bertoldi – matecat.com Page 1. This document is part of the Project “Machine Translation Enhanced Computer Assisted Translation (MateCat)”, funded by the 7th Framework Programme of the European Commission through grant agreement no.: 287688. Machine Translation Enhanced … Related articles Cite Save More

Robustness of Distant-Speech Recognition and Speaker Identification–Development of Baseline System G Potamianos, A Abad, A Brutti, M Hagmuller, G Kubin… – 2013 – dirha.fbk.eu … industrial applications. FBK is also engaged in the development of IRSTLM, an open source toolkit that provides algorithms and data structures to estimate, store, and access very large statistical language models [25]. New dedicated … Cited by 1 Related articles All 2 versions Cite Save More

Second Report on Lab and Field Test N Bertoldi, M Cettolo, M Negri, M Turchi, M Federico… – 2013 – matecat.com … train- ing data (Table 2); 5-gram LMs smoothed through the improved Kneser-Ney technique [Chen and Goodman, 1999] are estimated on the target side by means of the IRSTLM toolkit [Fed- erico et al., 2008]. The weights … Related articles Cite Save More

Lexicon-supported OCR of eighteenth century Dutch books: a case study J de Does, K Depuydt – IS&T/SPIE …, 2013 – proceedings.spiedigitallibrary.org … 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities, 33-38 (2011). [9] Federico, M., Bertoldi, N and Cettolo, M., “IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models”, Proc. … Cited by 6 Related articles All 4 versions Cite Save

Statistical machine translation system for English to Urdu RB Mishra – International Journal of Advanced Intelligence …, 2013 – Inderscience … Modified Kneser-Ney discounting is used as smoothing scheme for training 5-gram language model. There are more open source statistical language modelling toolkits available like IRSTLM, RandLM and KenLM. 4.4 Translation model … Related articles All 3 versions Cite Save

Domain Adaptation for Statistical Machine Translation of Corporate and User-Generated Content P Banerjee – 2013 – nclt.computing.dcu.ie Page 1. Domain Adaptation for Statistical Machine Translation of Corporate and User-Generated Content Pratyush Banerjee B. Tech, MS. A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.) to the Dublin City University … Cited by 1 Related articles All 5 versions Cite Save More

Improving the quality of MT output using novel name entity translation scheme D Bhalla, N Joshi, I Mathur – Advances in Computing, …, 2013 – ieeexplore.ieee.org … [13] To download IRSTLM toolkit http://www.statmt.org [14] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to speech Recognition, natural language processing, and computational linguistics. … Cited by 1 Related articles All 3 versions Cite Save

Quality Estimation Software Extensions L Specia, K Shah, E Avramidis – 2013 – qt21.eu Page 1. FP7-ICT Coordination and Support Action (CSA) QTLaunchPad (No. 296347) Preparation and Launch of a Large-scale Action for Quality Translation Technology Deliverable D2.1.2 Quality Estimation Software Extensions … Related articles Cite Save More

Computing n-gram statistics in MapReduce K Berberich, S Bedathur – … of the 16th International Conference on …, 2013 – dl.acm.org Page 1. Computing n-Gram Statistics in MapReduce Klaus Berberich Max Planck Institute for Informatics Saarbrücken, Germany kberberi@mpi-inf.mpg.de Srikanta Bedathur Indraprastha Institute of Information Technology New Delhi, India bedathur@iiitd.ac.in … Cited by 4 Related articles All 21 versions Cite Save

A fast and flexible architecture for very large word n-gram datasets M Flor – Natural Language Engineering, 2013 – Cambridge Univ Press Page 1. Natural Language Engineering 19 (1): 61–93. c Cambridge University Press 2012 doi:10.1017/S1351324911000349 61 A fast and flexible architecture for very large word n-gram datasets MICHAEL FLOR NLP and … Cited by 8 Related articles All 3 versions Cite Save

Machine Translation of Film Subtitles from English to Spanish J Isele – 2013 – mlta.uzh.ch Page 1. Institut für Computerlinguistik Machine Translation of Film Subtitles from English to Spanish Combining a Statistical System with Rule-based Grammar Checking Masterarbeit der Philosophischen Fakultät der Universität Zürich Referent: Prof. Dr. M. Volk Verfasserin: … Related articles Cite Save More

Automatically detected acoustic landmarks for assessing natural emotion from speech H Sierro – 2013 – diuf.unifr.ch Page 1. Automatically detected acoustic landmarks for assessing natural emotion from speech Hervé SIERRO herve.sierro@unifr.ch, BENEFRI Master Student Document, Image and Voice Analysis group University of Fribourg Thesis Supervisor: Dr. Fabien RINGEVAL … Cited by 1 Related articles All 5 versions Cite Save More

Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation A Bisazza – 2013 – eprints-phd.biblio.unitn.it Page 1. PhD Dissertation International Doctorate School in Information and Communication Technologies DISI – University of Trento Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: … Cited by 1 Related articles All 3 versions Cite Save