Truecasing - Meta-Guide.com

Notes:

Truecasing is a natural language processing (NLP) task that involves correcting the capitalization of text to follow the conventions of a particular language or style. It is often used to ensure that the text is correctly formatted and easy to read.

Truecasing is used in dialog systems to ensure that the text generated by the system follows the conventions of the language or style being used. This can be important for the clarity and readability of the text, as well as for the overall user experience of the dialog system.

There are several ways in which truecasing can be used in dialog systems:

Preprocessing: Truecasing can be used as a preprocessing step to correct the capitalization of user input before it is processed by the dialog system. This can help to ensure that the system is able to correctly interpret and understand the user’s words and phrases.
Output generation: Truecasing can be used to correct the capitalization of text generated by the dialog system to ensure that it follows the conventions of the language or style being used. This can help to improve the readability and clarity of the text.
Error correction: Truecasing can be used to automatically correct capitalization errors in user input or system output. This can be useful for improving the overall accuracy and reliability of the dialog system.

Wikipedia:

Truecasing

See also:

Tasks Of Natural Language Processing

[PDF] Truecasing clinical narratives. M Kreuzthaler, S Schulz – Studies in Health Technology and …, 2011 – person.hst.aau.dk Correction Phenomenon Total Units Right case correction of normal tokens 896 909 tokens Right case correction of acronyms 13 16 tokens Correction of diacritics (” ä”,” ö”,” ü”,” ß”) 73 80 occurrences” c”,” k”,” z”-variants corrected 4 21 occurrences Meaning of sentence … Related articles All 3 versions

[PDF] from upenn.edu tRuEcasIng LV Lita, A Ittycheriah, S Roukos… – Proceedings of the 41st …, 2003 – dl.acm.org Abstract Truecasing is the process of restoring case information to badly-cased or non- cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of~ 98% on news articles. Task … Cited by 44 Related articles All 31 versions

[PDF] from aclweb.org More linguistic annotation for statistical machine translation P Koehn, B Haddow, P Williams, H Hoang – Proceedings of the Joint …, 2010 – dl.acm.org … tokenization with hyphen splitting • truecasing • grow-diag-final-and alignment heuristic • msd-bidirectional-fe lexicalized reordering … Table 3: Effect of truecasing: cased and uncased BLEU scores timized on the development set newsdev2009. … Cited by 12 Related articles All 17 versions

[PDF] from rug.nl [PDF] Edinburgh’s submission to all tracks of the WMT2009 shared task with reordering and speed improvements to Moses P Koehn, B Haddow – Proceedings of the Fourth Workshop on …, 2009 – acl.eldoc.ub.rug.nl … This paper de- scribes the configuration of the systems, plus novel contributions to Moses includ- ing truecasing, more efficient decoding methods, and a framework to specify re- ordering constraints. 1 Introduction … Perplexity numbers are shown in Table 1. 2.2 Truecasing … Cited by 21 Related articles All 22 versions

[PDF] from nrc-cnrc.gc.ca [PDF] Truecasing for the Portage system A Agbago, R Kuhn, G Foster – 2005 – nparc.cisti-icist.nrc-cnrc.gc.ca Abstract: This paper presents a truecasing technique-that is, a technique for restoring the normal case form to an all lowercased or partially cased text. The technique uses a combination of statistical components, including an N-gram language model, a case … Cited by 10 Related articles All 11 versions

[PDF] from statmt.org Towards better machine translation quality for the German–English language pairs P Koehn, A Arun, H Hoang – Proceedings of the Third Workshop on …, 2008 – dl.acm.org … hyph. + truecase 20.7 (+0.4) 27.8 (+0.2) Table 2: Impact of truecasing on case-sensitive BLEU … A final modification to the data preparation is truecasing. Traditionally, we lowercase all training and test data, but especially in German, case marks important distinctions. … Cited by 19 Related articles All 20 versions

[PDF] from microsoft.com [PDF] DOES CAPITALIZATION MATTER IN WEB SEARCH? S Cucerzan – Proceedings of KDIR 2010}, 2010 – research.microsoft.com Page 1. DOES CAPITALIZATION MATTER IN WEB SEARCH? Silviu Cucerzan Microsoft Research, 1 Microsoft Way, Redmond, USA silviu@microsoft.com Keywords: Web search, queries, capitalization, truecasing, ranking. … Related articles All 4 versions

[PDF] from statmt.org Hierarchical phrase-based MT at the Charles University for the WMT 2011 shared task D Zeman – Proceedings of the Sixth Workshop on Statistical …, 2011 – dl.acm.org … The de- coder must use the SWIG-linked SRILM library because Java-based language modeling is too slow and memory-consuming. 4.3 Supervised Truecasing … As contrastive runs we applied the supervised truecasing to other directions as well. … Cited by 2 Related articles All 25 versions

[PDF] from ehu.es [PDF] Using Apertium linguistic data for tokenization to improve Moses SMT performance SO Rojas, SC Vaillo, UMH Campus, E Quorum III – LIHMT 2011, 2011 – ixa2.si.ehu.es … The new method involves reusing the mor- phological analyser and part-of-speech tagger of the Apertium rule-based machine transla- tion system to enrich the default tokeniza- tion used in Moses with part-of-speech-based truecasing, multi-word-unit chunking, number … Related articles All 3 versions

A case study of using web search statistics: case restoration S Cucerzan – Computational Linguistics and Intelligent Text …, 2010 – Springer … 1.1 Case Restoration Case restoration (also known as truecasing) is a lexical disambiguation task that addresses the problem of adding or restoring capitalization information to a text that misses or has inconsistent such information. … Cited by 3 Related articles All 4 versions

[PDF] from mit.edu [PDF] The JHU workshop 2006 IWSLT system W Shen, R Zens, N Bertoldi, M Federico… – eps, 2006 – extwebprod.ll.mit.edu … For this year’s IWSLT evaluation we applied factored translation models to the problem of TrueCasing MT output. We apply a simple HMM model for truecasing [11] im- plemented using the disambig tool developed by SRI [9]. Our model can be defined as follows: … Cited by 20 Related articles All 20 versions

[PDF] from nrc-cnrc.gc.ca Lessons from NRC’s Portage system at WMT 2010 S Larkin, B Chen, G Foster, U Germann… – Proceedings of the …, 2010 – dl.acm.org … In WMT 2010, Portage scored 28.5 BLEU (un- cased) for FE, but only 27.0 BLEU (uncased) for EF. For both language pairs, Portage tru- ecasing caused a loss of 1.4 BLEU; other WMT systems typically lost around 1.0 BLEU after truecasing. … 4.5 Fixing truecasing … Cited by 4 Related articles All 18 versions

[PDF] from upenn.edu [PDF] A unified tagging approach to text normalization C Zhu, J Tang, H Li, HT Ng, T Zhao – ANNUAL MEETING- …, 2007 – acl.ldc.upenn.edu … For the case restoration subtask (processing on token sequence), we employed the TrueCasing method (Lita et al., 2003). … The CRF based casing method estimates a conditional probabilistic model using the same data and the same tags defined in TrueCasing. … Cited by 4 Related articles BL Direct All 50 versions

[CITATION] tRuEcasIng LV Lita, A Ittycheriah, S Roukos, N Kambhatla – Proceedings of ACL, 2003 Cited by 2 Related articles

[PDF] from euromatrix.net [PDF] 2.2: Refined Factored Translation Model H Hoang, P Koehn, A Arun, B Haddow – 2009 – euromatrix.net Page 1. 2.2: Refined Factored Translation Model Hieu Hoang, Philipp Koehn, Abhishek Arun, Barry Haddow Distribution: Final EuroMatrix Statistical and Hybrid Machine Translation Between All European Languages IST 034291 Deliverable 2.2 February 27, 2009 … Related articles All 3 versions

[PDF] from mt-archive.info [PDF] Bringing humans into the loop: localization with machine translation at Traslan D Groves, C Wicklow – Proceedings of the Conference of the …, 2008 – mt-archive.info … 5.3.1 Truecasing & Capitalization Issues One of the main errors which irritated our trans- lators is the matter of truecasing. Truecasing is the process of restoring case information to badly-cased or noncased text (Lita et al., 2003). … Cited by 7 Related articles All 8 versions

[PDF] from mu.oz.au Restoring punctuation and casing in English text T Baldwin, M Joseph – AI 2009: Advances in Artificial Intelligence, 2009 – Springer … [10] look into the task of truecasing, or case restoration of text. … 549 mentioned above and the truecasing task is misleading: the accuracy reported for the punctuation tasks is at the sentence level, whereas, in case of the true- casing task it is at the word level. 3 Task Description … Cited by 4 Related articles All 4 versions

[PDF] from umontreal.ca [PDF] PORTAGE in the NIST 2009 MT Evaluation G Foster, B Chen, E Joanis, H Johnson… – NRC-CNRC, Tech. …, 2009 – iro.umontreal.ca … 29 8.2 Feature Selection . . . . . 30 9 Truecasing 31 9.1 Naive George versus HMM . . . . . 31 9.2 Tuning HMM Truecasing . . . . . 32 9.3 Title Capitalization . . . . . 32 10 Conclusions 33 … Cited by 2 Related articles

[PDF] from psu.edu Language dynamics and capitalization using maximum entropy F Batista, N Mamede, I Trancoso – … of the 46th Annual Meeting of the …, 2008 – dl.acm.org … 1 Introduction The capitalization task, also known as truecasing (Lita et al., 2003), consists of rewriting each word of an input text with its proper case information. … LV Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla. 2003. tRuEcasIng. In Proc. … Cited by 7 Related articles All 15 versions

[PDF] from fbk.eu [PDF] Creating Term and Lexicon Entries from Phrase Tables G Thurmair, V Aleksic – Proc. EAMT, Trento This paper describes the …, 2012 – hltshare.fbk.eu … Lemma creation implies the creation of a ca- nonical form for the entry. This has two aspects: • Truecasing of all lemma parts: Proper names and German common nouns should be capitalized, the other forms lowercased. • Production of the canonical form of the lemma. … Related articles All 5 versions

[PDF] from dtic.mil The MIT-LL/AFRL IWSLT-2006 MT System W Shen, B Delaney, T Anderson – 2006 – DTIC Document … Table 6: Effects of different pre/post-processing methods (dev4) To produce mixed-case output, we applied implemented an HMM-based truecasing model as proposed in [26]: … Similarly, small gains can be had by choosing the appropriate language model order for TrueCasing. … Cited by 30 Related articles All 30 versions

[PDF] from aclweb.org Phrasal: A toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features D Cer, M Galley, D Jurafsky, CD Manning – Proceedings of the NAACL …, 2010 – dl.acm.org … Two n- gram language models are trained on the tar- get.txt sentences: one over lowercased target sen- tences that will be used by the Phrasal decoder and one over the original source sentences that will be used for truecasing the MT output. … Cited by 5 Related articles All 26 versions

[PDF] from inesc-id.pt [PDF] Keyphrase Cloud Generation of Broadcast News L Marujo, M Viveiros, JP Neto – proceeding of 12th Annual Conference of …, 2011 – inesc-id.pt … The generation of the missing information makes this repre- sentation format easier to read and understand, and mitigates problems to further automatic processing [2]. Capitalization, also known as truecasing, improves human readability, parsing, and NER (Named Entity … Cited by 3 Related articles All 3 versions

[PDF] from yum2.net [PDF] Spell checking techniques for replacement of unknown words and data cleaning for Haitian Creole SMS translation S Stymne – Proceedings of the Sixth Workshop on Statistical …, 2011 – t3-1.yum2.net … the corpus when it is not sentence initial. In the noisy SMS data, though, there were many sentences with all capital letters that would influence this truecasing method negatively. To address this, we modified the algorithm to … Cited by 1 Related articles All 15 versions

[CITATION] RWTH’s system combination for the NIST 2009 MT ISC evaluation G Leusch, S Hasan, S Mansour, M Huck, H Ney – NIST Open Machine Translation …, 2009 Cited by 2 Related articles All 2 versions

[PDF] from lingol.cz An experimental management system P Koehn – The Prague Bulletin of Mathematical Linguistics, 2010 – Versita … OUTPUT-FACTOR: commands to create factors • TRAINING: training a translation model • LM: training a language model • INTERPOLATED-LM: interpolate language models • SPLITTER: training a word splitting model • RECASING: training a recaser • TRUECASING: training a … Cited by 6 Related articles All 9 versions

Adaptation of exponential models CI Chelba, A Acero – US Patent 7,860,314, 2010 – Google Patents … 18”’ International Conf. on Machine Learning; pp. 282-289, Morgan Kauffman, San Francisco, CA. L. Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla, 2003, “tRuEcasIng,” In Proceedings ofACL, pp. 152-159, Sapporo, Japan. … Related articles All 5 versions

[PDF] from mt-archive.info [PDF] The MIT-LL/AFRL IWSLT-2011 MT System AR Aminzadeh, T Anderson, R Slyh, B Ore… – Proceedings of the …, 2011 – mt-archive.info … 2.2. Language Model Training During the training process we built n-gram language models for use in decoding/rescoring, TrueCasing and repunctuation. In all cases, the MIT Language Modeling Toolkit [13] was used to create interpolated Knesser-Ney LMs. … Cited by 3 Related articles All 4 versions

[PDF] from pitt.edu [PDF] Transcribing human-directed speech for spoken language processing M Ostendorf – Proc. INTERSPEECH, 2009 – cs.pitt.edu … Having such markup, along with simple inverse text normal- ization transformations (eg, of word strings into numbers) and truecasing, makes speech more like text and therefore makes it possible to use linguistic resources based on written text for training spoken language … Cited by 2 Related articles All 5 versions

[PDF] from aclweb.org Phrase-based and deep syntactic English-to-Czech statistical machine translation O Bojar, J Hajic – Proceedings of the third Workshop on Statistical …, 2008 – dl.acm.org … followed by Minnen et al. (2001) for English. We symmetrized the two GIZA++ runs us- ing grow-diag-final heuristic. Truecasing. We attempted to preserve meaning- bearing case distinctions. The Czech lemmatizer produces case … Cited by 22 Related articles All 17 versions

Electronic mail data cleaning H Li, Y Cao, ZH Tang – US Patent 7,590,608, 2009 – Google Patents … In ICML 01, 2001. LV Lita, A. Ittycheriah, S. Roukos, and N. Kambhafla. tRuEcasIng. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Jul. 7-12, Sapporo, Japan. A. McCallum, D. Freitag, and F. Pereira. … Related articles All 4 versions

[PDF] from inesc-id.pt [PDF] The impact of language dynamics on the capitalization of broadcast news F Batista, N Mamede, I Trancoso – Proc. of Interspeech, 2008 – inesc-id.pt … 1. Introduction The capitalization, also known as truecasing [1], consists of rewriting each word of an input text with its proper case infor- mation. … 8. References [1] LV Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla, “tRuEcasIng,” in Proc. … Cited by 6 Related articles All 3 versions

[PDF] from 55-works.com Reproducible results in parsing-based machine translation: the JHU shared task submission L Schwartz – Proceedings of the Joint Fifth Workshop on Statistical …, 2010 – dl.acm.org … Optional front-end wrapper scripts can also be provided, allowing for a complete experiment to be run – from downloading data and software through truecasing translated results – by execut- ing a single make file. This framework is also conducive to paralleliza- tion. … Cited by 7 Related articles All 17 versions

[PDF] from aclweb.org The RALI machine translation system for WMT 2010 S Huet, J Bourdaillet, A Patry, P Langlais – … of the Joint Fifth Workshop on …, 2010 – dl.acm.org … obtained a 23.24% case-insensitive BLEU and a 22.13% case-sensitive BLEU. As truecasing induces an in- crease of the two metrics, we built all our mod- els in truecase. The results shown in the remain- der of this paper are … Cited by 3 Related articles All 29 versions

[PDF] from inesc-id.pt A critical survey on the use of Fuzzy Sets in Speech and Natural Language Processing JP Carvalho, F Batista, L Coheur – Fuzzy Systems (FUZZ-IEEE) …, 2012 – ieeexplore.ieee.org … transcription. Namely, one must deal with word boundary detection, sentence boundary detection, punctuation detection, truecasing, speaker recognition (which can be important when several participants are involved), etc. … Related articles All 2 versions

[PDF] from inesc-id.pt [PDF] Recovering Capitalization and Punctuation Marks on Speech Transcriptions F Batista, N Mamede – 2011 – inesc-id.pt … al., 2008). The capitalization task, also known as truecasing (Lita et al., 2003), consists of assigning to each word of an input text its corresponding case information, which sometimes depends on its context. Proper capitalization … Related articles All 5 versions

[PDF] from ieeta.pt [PDF] Automatic Recovery of Punctuation Marks and Capitalization Information for Iberian Languages F Batista, I Trancoso, N Mamede – I Joint SIG-IL/Microsoft Workshop on …, 2009 – ieeta.pt … the ASR output. 2.2. Capitalization The capitalization task, also known as truecasing [13], consists of assigning the proper case information to each input word, which may depend on the context. Proper capitalization can be … Cited by 2 Related articles All 6 versions

[PDF] from naist.jp [PDF] The University of Washington Machine Translation System for IWSLT 2009 M Yang, A Axelrod, K Duh, K Kirchhoff – Proc. of IWSLT, 2009 – cl.naist.jp … In order to match the evalua- tion guidelines, we post-processed the output by re-attaching the possessive particle and restoring true case. Truecasing is done by a noisy-channel model as implemented in the dis- ambig tool in the SRILM package. … Cited by 1 Related articles All 5 versions

[PDF] from dtic.mil The MIT-LL/AFRL IWSLT-2010 MT System W Shen, T Anderson, R Slyh, AR Aminzadeh – 2010 – DTIC Document … 2.2. Language Model Training During the training process we built n-gram language models for use in decoding/rescoring, TrueCasing and repunctuation. In all cases. the SRI Language Modeling Toolkit [12]lVas used to create interpolated Knesser-Ney LMs. … Cited by 2 Related articles All 7 versions

[PDF] from nict.go.jp [PDF] The XMU phrase-based statistical machine translation system for IWSLT 2006 Y Chen, X Shi, C Zhou – Proc. of the International Workshop on Spoken …, 2006 – nict.go.jp … sentences. • Truecasing of the first word of an English sentence: To transform the uppercase version of the beginning words of English sentences into their lowercase version if their lowercase version occur more often. 2.2. Word … Cited by 5 Related articles All 9 versions

[PDF] from nict.go.jp [PDF] The RWTH statistical machine translation system for the IWSLT 2006 evaluation A Mauser, R Zens, E Matusov, S Hasan… – Proc. of the Int. Workshop …, 2006 – nict.go.jp … 5. add additional data (dev-corpora) translation: 1. translate test data (N-best list generation and rescoring) 2. postprocessing and truecasing systems for Japanese-English and Chinese-English almost identical (less reordering for JE) tuning for TEXT, no changes for ASR output … Cited by 50 Related articles All 21 versions

[PDF] from nrc-cnrc.gc.ca [PDF] Rule-based translation with statistical phrase-based post-editing M Simard, N Ueffing, P Isabelle, R Kuhn – 2007 – nparc.cisti-icist.nrc-cnrc.gc.ca … material using standard information retrieval techniques. • A 5-gram truecasing model, trained on the combined Europarl and News Commentary target-language corpora. 2.3 Training data Ideally, the training material for the … Cited by 70 Related articles All 36 versions

[PDF] from kit.edu [PDF] The ISL statistical machine translation system for the TC-STAR spring 2006 evaluation M Kolss, B Zhao, S Vogel… – … on Speech-to- …, 2006 – isl.anthropomatik.kit.edu … Table 1 shows the training and test corpus statis- tics after preprocessing. For scoring, generated punctuation marks were re-attached to words, and a truecasing module was run to restore case information. Our truecasing module … Cited by 7 Related articles All 11 versions

[PDF] from fbk.eu [PDF] The XMU SMT System for IWSLT 2007 Y Chen, X Shi, C Zhou – Proc. of the International Workshop on …, 2007 – iwslt07.fbk.eu … sentences. • Truecasing of the first word of an English sentence: To transform the uppercase version of the beginning words of English sentences into their lowercase version if their lowercase version occur more often. 2.2. Word … Cited by 2 Related articles All 4 versions

[PDF] from washington.edu [PDF] The University of Washington machine translation system for the IWSLT 2007 competition K Kirchhoff, M Yang – Proc. of IWSLT, 2007 – crow.ee.washington.edu … 2.3. Postprocessing The output from the second decoding pass is postprocessed to restore true case and punctuation. We use a hidden-event n-gram model [9, 10] to restore punctuation and a noisy- channel model for truecasing. … Cited by 8 Related articles All 14 versions

[PDF] from pascal-network.org Productive generation of compound words in statistical machine translation S Stymne, N Cancedda – Proceedings of the Sixth Workshop on …, 2011 – dl.acm.org … structured perceptrons (Collins, 2002), 1Nouns in German are capitalized. This is normally dealt as a further “truecasing” postprocessing, and is an orthogonal problem from the one we deal with here. and more. Since the focus … Cited by 1 Related articles All 15 versions

[PDF] from nrc-cnrc.gc.ca [PDF] NRC’s PORTAGE system for WMT 2007 U Ueffing, M Simard, S Larkin… – … : Proceedings of the …, 2007 – nparc.cisti-icist.nrc-cnrc.gc.ca … Zens and Ney, 2006). This year, we increased the length of the N-best lists from 1,000 to 5,000. 3.4 Post-processing For truecasing the translation output, we used the model described in (Agbago et al., 2005). This model uses a … Cited by 30 Related articles All 33 versions

[PDF] from ehu.es [PDF] Reordering by Parsing J Elming, M Haulrich – LIHMT 2011, 2011 – ixa2.si.ehu.es … et al., 2007). We use the base- line system from WMT113 as our baseline with the small modifications that we use truecasing instead of lowercasing and recasing, and allow training sen- tences of up to 80 words. For our reordering … Related articles All 4 versions

[PDF] from cuni.cz [PDF] Extracting Characteristics of Human-produced Video Descriptions M Korvas – 2012 – ufal.mff.cuni.cz Page 1. Extracting Characteristics of Human-produced Video Descriptions A thesis presented by Matej Korvas to ?e Department of Computing and Information Systems in partial fulfillment of the requirements for the degree of Master of Science University of Melbourne …

[PDF] from cnrs.fr Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news F Batista, D Caseiro, N Mamede, I Trancoso – Speech Communication, 2008 – Elsevier … Keywords: Rich transcription; Punctuation recovery; Sentence boundary detection; Capitalization; Truecasing; Maximum entropy; Language modeling; Weighted finite state transducers. Article Outline. 1. Introduction; 1.1. Related work on capitalization; 1.2. … Cited by 19 Related articles All 9 versions

[PDF] from rwth-aachen.de Combining natural language processing systems to improve machine translation of speech E Matusov – 2009 – darwin.bth.rwth-aachen.de … 133 6.5.1 Using a language model . . . . . 133 6.5.2 Handling of sentence segmentation differences . . . . . 134 6.5.3 TrueCasing . . . . . 135 6.6 Experimental results . . . . . … Cited by 1 Related articles All 7 versions

[PDF] from psu.edu [PDF] Stanford University’s Arabic-to-English Statistical Machine Translation System for the 2009 NIST MT Open Evaluation M Galley, DC Spence Green, PC Chang, CD Manning – 2010 – Citeseer … Lucian Vlad Lita, Abe Ittycheriah, Salim Roukos, and Nanda Kambhatla. 2003. tRuEcasIng. In Proceed- ings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 152–159. Einat Minkov, Richard Wang, Anthony Tomasic, and William Cohen. 2006. … Cited by 2 Related articles All 9 versions

[PDF] from upenn.edu Capitalizing machine translation W Wang, K Knight, D Marcu – Proceedings of the main conference on …, 2006 – dl.acm.org … 1 Introduction Capitalization is the process of recovering case in- formation for texts in lowercase. It is also called truecasing (Lita et al., 2003). Usually, capitalization itself tries to improve the legibility of texts. … 2003. tRuEcasIng. … Cited by 21 Related articles All 28 versions

[PDF] from inesc-id.pt Comparing automatic rich transcription for portuguese, spanish and english broadcast news F Batista, I Trancoso, NJ Mamede – … Speech Recognition & …, 2009 – ieeexplore.ieee.org … B. Capitalization The capitalization task, also known as truecasing [15], consists of assigning the proper case information to each input word, which may depend on the context. … 1, 2004, pp. 525–528. [15] LV Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla, “tRuEcasIng,” in Proc. … Cited by 4 Related articles All 2 versions

[PDF] from rwth-aachen.de [PDF] Lexicon Models for Hierarchical Phrase-Based Machine Translation M Huck, S Mansour, S Wiesler… – Proc. of the Int. …, 2011 – www-i6.informatik.rwth-aachen.de … 4.2. Chinese?English NIST Task For the Chinese?English task we work with a parallel train- ing corpus of 3.0M Chinese-English sentence pairs. The En- glish target side of the data is lowercased, truecasing is part of the postprocessing pipeline. … Cited by 4 Related articles All 6 versions

Syntax-based statistical translation model K Yamada, K Knight – US Patent 8,214,196, 2012 – Google Patents Page 1. US008214196B2 (12) Umted States Patent (10) Patent No.: US 8,214,196 B2 Yamada et al. (45) Date of Patent: Jul. 3, 2012 (54) SYNTAX-BASED STATISTICAL 4,791,587 A 12/1988 Doi TRANSLATION MODEL 4,800,522 A 1/1989 Miyao et al. … Related articles All 4 versions

[PDF] from microsoft.com [PDF] Entity Disambiguation based on a Probabilistic Taxonomy M Shirakawa, H Wang, Y Song, Z Wang, K Nakayama… – 2011 – research.microsoft.com … In our method, we combine both approaches to detect terms as a person does. Our two challenges on entity disambiguation are supported by some preprocessing and postprocessing such as sentence breaking, concept acquisition and truecasing. … Related articles

[PDF] from mit.edu Sepia: Semantic parsing for named entities GA Marton – 2003 – dspace.mit.edu Page 1. Sepia: Semantic Parsing for Named Entities by Gregory A. Marton Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF … Cited by 5 Related articles All 3 versions

[PDF] from rwth-aachen.de Hierarchical Phrase-Based Translation with Jane 2 M Huck, JT Peter, M Freitag, S Peitz, H Ney – The Prague Bulletin of …, 2012 – Versita … We work with a parallel training corpus of 3.0M Chinese-English sentence pairs (77.5M Chinese / 81.0M English running words). The English target side of the data is lowercased, truecasing is part of the postprocessing pipeline. … Related articles All 6 versions

[PDF] from eurasip.org [PDF] ASR DOMAIN ADAPTATION METHODS FOR LOW-RESOURCED LANGUAGES: APPLICATION TO ROMANIAN LANGUAGE H Cucu12, L Besacier, C Burileanu, A Buzo – 2012 – eurasip.org … 151-168, 2010. [9] L. Lita, A. Ittycheriah, S. Roukos, N. Kambhatla, “tRuEcasIng,” ACL 2003, Sapporo, Japan, p.152-159, 2003. [10] A. Stolcke, “SRILM – an extensible language modeling toolkit,” ICSLP 2002, Colorado, USA, 2002. … Related articles

[PDF] from nist.gov [PDF] TAC entity linking by performing full-document entity extraction and disambiguation S Cucerzan – Proc. of TAC, 2011 – nist.gov Page 1. TAC Entity Linking by Performing Full-document Entity Extraction and Disambiguation Silviu Cucerzan Microsoft Research Machine Learning Group November 15, 2011 Gaithersburg, MD Page 2. KBP Entity Linking – Task Description … Cited by 2 Related articles All 4 versions

[PDF] from inesc-id.pt Temporal Issues and Recognition Errors on the Capitalization of Speech Transcriptions F Batista, N Mamede, I Trancoso – Text, Speech and Dialogue, 2008 – Springer … References 1. Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. In: EMNLP 2004 (2004) 2. Lita, LV, Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasIng. In: Proc. of the 41st annual meeting on ACL, Morristown, NJ, USA, pp. … Cited by 1 Related articles All 5 versions

[PDF] from 96.126.103.184 [PDF] Stanford University’s Chinese-to-English statistical machine translation system for the 2008 NIST evaluation M Galley, PC Chang, D Cer, JR Finkel… – Proceedings of the …, 2008 – 96.126.103.184 … on Machine Learning. Lucian Vlad Lita, Abe Ittycheriah, Salim Roukos, and Nanda Kambhatla. 2003. tRuEcasIng. In Proceed- ings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 152–159. Franz Josef Och and Hermann Ney. 2003. … Cited by 1 Related articles All 22 versions

[PDF] from psu.edu [PDF] A lightweight on-the-fly capitalization system for automatic speech recognition F Batista, N Mamede, D Caseiro, I Trancoso – development, 2008 – Citeseer … Evaluation results are presented both for writ- ten newspaper corpora and speech transcrip- tions of broadcast news corpora. Keywords Rich transcription, capitalization, truecasing, maximum en- tropy, language models, weighted finite state transducers 1 Introduction … Cited by 6 Related articles All 7 versions

Building a translation lexicon from comparable, non-parallel corpora D Marcu, K Knight, DS Munteanu, P Koehn – US Patent 8,234,106, 2012 – Google Patents Page 1. (12) United States Patent Marcu et al. US008234l06B2 US 8,234,106 B2 Jul. 31, 2012 (10) Patent No.: (45) Date of Patent: (54) (75) (73) (21) (22) (65) (63) (60) (51) (52) (58) BUILDING A TRANSLATION LEXICON FROM … Related articles All 4 versions

[PDF] from washington.edu [PDF] The University of Washington machine translation system for IWSLT 2006 K Kirchhoff, K Duh, C Lim – Proceedings of IWSLT 2006, 2006 – ssli.ee.washington.edu … ff ocus uses LMs trained on the BTEC training set only, while ff ocusF includes Fisher data as well. 6. Postprocessing For postprocessing we use a hidden-event n-gram model [14, 15] to restore punctuation and a noisy-channel model for truecasing. … Cited by 5 Related articles All 22 versions

[PDF] from inesc-id.pt Impact of dynamic model adaptation beyond speech recognition F Batista, R Amaral, I Trancoso… – … Workshop, 2008. SLT …, 2008 – ieeexplore.ieee.org … Interspeech 2007, Sep. 2007. [3] Lucian Vlad Lita, Abe Ittycheriah, Salim Roukos, and Nanda Kambhatla, “tRuEcasIng,” in Proc. of the 41 st annual meeting on ACL, Morristown, NJ, USA, 2003, pp. 152–159, ACL. [4] C. Chelba … Cited by 2 Related articles All 3 versions

[PDF] from upenn.edu [PDF] Factored translation models P Koehn, H Hoang – Proceedings of the 2007 joint conference on …, 2007 – acl.ldc.upenn.edu … Lita, LV, Ittycheriah, A., Roukos, S., and Kambhatla, N. (2003). tRuEcasIng. In Hinrichs, E. and Roth, D., editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 152–159. Melamed, ID (2004). … Cited by 258 Related articles All 33 versions

[PDF] from 128.220.117.40 [PDF] The Johns Hopkins University 2004 Chinese-English and Arabic-English MT Evaluation Systems S Kumar, Y Deng, C Schafer, W Kim… – DARPA/NIST …, 2004 – 128.220.117.40 … 8 Nbest from 6 30.0 28.8 27.6 27.8 39.6 42.2 36.5s Eval04 results are case-sensitive BLEU – Truecasing was performed using WS ’03 capitalizer trained on in-domain English text AE submitted systems: Primary (2), Very Late Contrast (6 and 8) … Cited by 1 Related articles All 10 versions

[PDF] from ed.ac.uk [PDF] Edinburgh system description for the 2006 TC-STAR spoken language translation evaluation A Arun, A Axelrod, AB Mayne… – … on Speech-to- …, 2006 – homepages.inf.ed.ac.uk … In truecasing, the first letter of the first word of each sentence (unless it is a fully capitalized word) is lowercased. Using a truecased corpus, we would expect our phrase table to be less sparse and learn more accurate phrase translation probabilities. … Cited by 4 Related articles All 6 versions

[PDF] from ipl.pt Learning techniques for automatic email message tagging T Tam – 2011 – repositorio.ipl.pt Page 1. INSTITUTO SUPERIOR DE ENGENHARIA DE LISBOA Área Departamental de Engenharia de Electrónica e Telecomunicações e de Computadores Mestrado em Engenharia Informática e de Computadores (Perfil de Sistemas de Informação) … Related articles

Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts F Batista, H Moniz, I Trancoso… – Audio, Speech, and …, 2012 – ieeexplore.ieee.org … SU boundaries provide a basis for further natural language processing, and their impact on subsequent tasks has been ana- lyzed in many speech processing studies [5]–[7]. The capitalization task, also known as truecasing [8], consists of assigning to each word of an input … Cited by 2 Related articles All 3 versions

[PDF] from aclweb.org The best lexical metric for phrase-based statistical MT system optimization D Cer, CD Manning, D Jurafsky – … : The 2010 Annual Conference of the …, 2010 – dl.acm.org Page 1. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pages 555–563, Los Angeles, California, June 2010. cO2010 Association for Computational Linguistics The Best Lexical Metric for … Cited by 18 Related articles All 22 versions

[PDF] from mt-archive.info [PDF] The RWTH machine translation system for IWSLT 2007 A Mauser, D Vilar, G Leusch, Y Zhang… – … Workshop on Spoken …, 2007 – mt-archive.info … procedure. This especially holds for the syntactic models. Truecasing was done after system combination using the SRI disambig tool with a language model trained on the sup- plied training data. 7.1. Progress over Time In … Cited by 11 Related articles All 7 versions

[PDF] from ed.ac.uk [PDF] Randomised Features in Discriminative Machine Learning G Gallone – 2008 – homepages.inf.ed.ac.uk … 97 C.3 Collaboration diagram for maxent::RandLBFGSTrainer . . . . . 98 vii Page 9. LIST OF TABLES 3.1 Classes of words and corresponding tags in our test-case . . . . . 45 3.2 Feature templates in our truecasing example. . . . . 46 4.1 Experimental Setup . . . . . … Related articles All 13 versions

[PDF] from nrc-cnrc.gc.ca [PDF] Portage phrase-based system for chinese-to-english translation R Kuhn, G Foster, S Larkin, N Ueffing – 2006 – nparc.cisti-icist.nrc-cnrc.gc.ca … 6 References Agbago, A., Kuhn, R., and Foster, G. (2005). Truecasing for the Portage System. In Int. Conf. on Recent Advances in Natural Language Processing (RANLP). Borovets, Bulgaria: pp. 25-31. Babych, B., and Hartley, A. (2004). … Cited by 3 Related articles All 8 versions

[PDF] from upenn.edu Adaptation of maximum entropy capitalizer: Little data can help a lot C Chelba, A Acero – Computer Speech & Language, 2006 – Elsevier Cited by 113 Related articles All 24 versions

[PDF] from euromatrix.net [PDF] 2.1: Factored Translation Model Prototype System P Koehn, H Hoang – 2007 – euromatrix.net … Lita, LV, Ittycheriah, A., Roukos, S., and Kambhatla, N. (2003). tRuEcasIng. In Hinrichs, E. and Roth, D., editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 152–159. Melamed, ID (2004). … Related articles All 4 versions

[PDF] from arxiv.org Enhanced integrated scoring for cleaning dirty texts W Wong, W Liu, M Bennamoun – arXiv preprint arXiv:0810.0332, 2008 – arxiv.org Page 1. arXiv:0810.0332v1 [cs.AI] 2 Oct 2008 Enhanced Integrated Scoring for Cleaning Dirty Texts Wilson Wong, Wei Liu and Mohammed Bennamoun School of Computer Science and Software Engineering University of Western … Cited by 9 Related articles All 12 versions

[PDF] from upenn.edu [PDF] Monolingual machine translation for paraphrase generation C Quirk, C Brockett, W Dolan – Proceedings of EMNLP, 2004 – acl.ldc.upenn.edu … 1997. A DP Based Search Using Monotone Alignments in Statistical Translation. In Proceedings of the ACL. L. Vita, A. Ittycheriah, S. Roukos, and N. Kambhatla. 2003. tRuEcasing. In Proceedings of the ACL: 152- 159. Sapporo, Japan. S. Vogel, H. Ney and C. Tillmann. 1996. … Cited by 143 Related articles All 29 versions

[PDF] from diva-portal.org Text Harmonization Strategies for Phrase-Based Statistical Machine Translation S Stymne – 2012 – liu.diva-portal.org Page 1. Linköping Studies in Science and Technology. Dissertations. No. 1451 Text Harmonization Strategies for Phrase-Based Statistical Machine Translation Sara Stymne Department of Computer and Information Science Linköping University SE-581 83 Linköping, … Related articles

[PDF] from rwth-aachen.de System combination for machine translation of spoken and written language E Matusov, G Leusch, RE Banchs… – Audio, Speech, and …, 2008 – ieeexplore.ieee.org Page 1. 1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov … Cited by 33 Related articles All 9 versions

[PDF] from crpit.com [PDF] Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text W Wong, W Liu, M Bennamoun – Conferences in Research and Practice in …, 2006 – crpit.com Page 1. Integrated Scoring for Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text Wilson Wong, Wei Liu and Mohammed Bennamoun School of Computer Science and Software Engineering … Cited by 23 Related articles All 11 versions

[PDF] from llnl.gov [PDF] Storage-intensive supercomputing benchmark study J Cohen, D Dossa, M Gokhale… – … , Tech. Rep. UCRL- …, 2007 – e-reports-ext.llnl.gov Page 1. UCRL-TR-236179 Storage-Intensive Supercomputing Benchmark Study J. Cohen, D. Dossa, M. Gokhale, D. Hysom, J. May, R. Pearce, A. Yoo November 5, 2007 Page 2. Disclaimer This document was prepared as an … Cited by 4 Related articles All 3 versions

Rewriting the orthography of SMS messages F Yvon – Natural Language Engineering, 2010 – Cambridge Univ Press Page 1. Natural Language Engineering 16 (2): 133–159. c Cambridge University Press 2010 doi:10.1017/S1351324909990258 133 Rewriting the orthography of SMS messages FRANC¸ OIS YVON LIMSI-CNRS and Université Paris Sud 11, Paris, France e-mail: yvon@limsi.fr … Cited by 10 Related articles All 4 versions

[PDF] from psu.edu Email data cleaning J Tang, H Li, Y Cao, Z Tang – Proceedings of the eleventh ACM SIGKDD …, 2005 – dl.acm.org Page 1. Email Data Cleaning Jie Tang Department of Computer Science Tsinghua University 12#109, Tsinghua University Beijing, China, 100084 j-tang02@mails. tsinghua.edu.cn Hang Li, Yunbo Cao Microsoft Research Asia … Cited by 49 Related articles All 15 versions

[PDF] from nrc-cnrc.gc.ca [PDF] Système de traduction automatique statistique combinant différentes ressources F Sadat, G Foster, R Kuhn – 2006 – nparc.cisti-icist.nrc-cnrc.gc.ca … Références AGBAGO A., KUHN R., ET FOSTER G. (2005). Truecasing for the PORTAGE System. Actes de International Conference on Recent Advances in Natural Language Processing RANLP 2005, Borovets, Bulgaria, 21-23 Septembre 2005. … Cited by 3 Related articles All 5 versions

[PDF] from aclweb.org [PDF] Scaling high-order character language models to gigabytes B Carpenter – Proceedings of the ACL Workshop on Software, 2005 – aclweb.org Page 96. Proceedings of the ACL 2005 Workshop on Software, pages 86–99, Ann Arbor, June 2005. cO 2005 Association for Computational Linguistics Scaling High-Order Character Language Models to Gigabytes Bob Carpenter Alias-i, Inc. … Cited by 25 Related articles All 14 versions

[PDF] from aaaipress.org Orthographic case restoration using supervised learning without manual annotation NIU CHENG, LI WEI, J Ding… – International Journal on …, 2004 – World Scientific Page 1. International Journal on Artificial Intelligence Tools %%^. . Vol. 13, No. l (2004) 141-156 V p World Scientific „ ,., , r, , ,. , www.worldscientific.com (g) World scientific Publishing Company ORTHOGRAPHIC … Cited by 7 Related articles BL Direct All 8 versions

[PDF] from ed.ac.uk [PDF] Joint Learning for Named Entity Recognition and Capitalization Generation A Khare – 2006 – inf.ed.ac.uk … 10 Lita et al. [20] call the capitalization generation process as ‘truecasing‘ and propose a statistical language modelling approach to the problem. They employ a trigram language model for local context and use Viterbi decoding for sentence level inference. … Cited by 1 Related articles All 5 versions

[PDF] from uccs.edu [PDF] Syntactic normalization of Twitter messages M Kaufmann, J Kalita – International Conference on Natural Language …, 2010 – cs.uccs.edu … ACM. [12] Lucian Vlad Lita, Abe Ittycheriah, Salim Roukos, and Nanda Kamb- hatla. truecasing. In ACL ’03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 152–159, Morristown, NJ, USA, 2003. … Cited by 22 Related articles All 5 versions

[PDF] from cam.ac.uk [PDF] Current Research in Phrase-Based Statistical Machine Translation and some links to ASR B Byrne – Work, 2005 – svr-www.eng.cam.ac.uk Page 1. Current Research in Phrase-Based Statistical Machine Translation and some links to ASR Bill Byrne Cambridge University Engineering Department wjb31@eng.cam.ac.uk 4 May 2005 Work done with Shankar Kumar and Yonggang Deng … Related articles All 6 versions

[PDF] from psu.edu [PDF] Maximum Entropy Modeling and Semantic Concept Detection J Argillander – 2005 – Citeseer Page 1. HELSINKI UNIVERSITY OF TECHNOLOGY Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing Janne Argillander Maximum Entropy Modeling and Semantic Concept Detection … Related articles All 4 versions

[PDF] from cam.ac.uk [PDF] Current Research in Statistical Machine Translation and Links with Automatic Speech Recognition B Byrne – 2004 – svr-www.eng.cam.ac.uk Page 1. Current Research in Statistical Machine Translation and Links with Automatic Speech Recognition Bill Byrne with Shankar Kumar and Yonggang Deng Department of Engineering, Cambridge University Trumpington … Related articles All 2 versions

[PDF] from buffalo.edu Question Answering Supported by Multiple Levels of Information Extraction R Srihari, W Li, X Li – Advances in Open Domain Question Answering, 2006 – Springer Page 1. ROHINI K. SRIHARI, WEI LI AND XIAOGE LI QUESTION ANSWERING SUPPORTED BY MULTIPLE LEVELS OF INFORMATION EXTRACTION Abstract: This chapter discusses the importance of information extraction (IE) in question answering (QA) systems. … Cited by 4 Related articles All 3 versions