Notes:
Corpus annotation is the process of adding additional information to a collection of text, also known as a corpus, in order to facilitate the analysis and interpretation of the text. This can involve adding annotations or tags to specific words or phrases to indicate their part of speech, sense, or meaning, or adding annotations to entire sentences or paragraphs to indicate their structure or meaning.
In the context of dialog systems, corpus annotation is used to create training data for natural language processing (NLP) algorithms, which are used to interpret and generate human-like language in the dialog system. By annotating a large collection of dialogues, it is possible to train an NLP model to understand the structure and meaning of language used in dialogues, and to generate appropriate responses.
Corpus annotation can be a time-consuming and labor-intensive process, as it requires humans to carefully read and analyze each piece of text in the corpus and add the appropriate annotations. However, the resulting annotated corpus can be a valuable resource for building and improving dialog systems, as it provides a large and diverse set of examples of how language is used in dialogues.
There are several tools that can be used for corpus annotation, including:
- BRAT (Brat Rapid Annotation Tool): An open-source tool for annotating text with tags or labels, and for visualizing the annotations in a web-based interface.
- Prodigy: A commercial software tool for creating and annotating training data for NLP models, with a focus on high efficiency and ease of use.
- TextAnnotator: A web-based tool for annotating text with tags or labels, with features such as automatic annotation suggestions and support for multiple languages.
- AnnoTate: A web-based tool for annotating text with tags or labels, with support for multiple annotation schemes and the ability to export annotations in a variety of formats.
- INCEpTION: A commercial software tool for annotating and analyzing text and audio data, with support for multiple annotation schemes and a range of visualization and analysis features.
- ELAN: A software tool for annotating and analyzing multimedia data, including audio and video, with support for multiple annotation schemes and a range of visualization and analysis features.
Resources:
- Corpora and Corpus Annotation Tools on the WWW (collected 2002)
- Linguistic Annotation Wiki (2011: describes tools and formats for creating and managing linguistic annotations)
See also:
[PDF] Towards the integrationn of synthetic sl annimation with avatars into corpus annotation tools [PDF] from uea.ac.uk R Elliott, J Bueno, R Kennaway… – 4th Workshop on the …, 2010 – vhg.cmp.uea.ac.uk Abstract We outline the main features of our synthetic virtual human sign language system, JASigning. We describe how we have extended its input notation, SiGML, to allow explicit control of performance time, and we describe our initial steps on the path to integrating … Cited by 4 – Related articles – View as HTML
[PDF] Assigning Wh-Questions to Verbal Arguments: Annotation Tools Evaluation and Corpus Building [PDF] from lrec-conf.org M Duran, M Amâncio, S Aluísio, K Choukri… – in LREC, 2010 – lrec-conf.org … We analyzed it in order not to loose the opportunity of finding a more appropriate tool for our task. NITE is a multimodal corpus annotation tool that meets several of our requirements: it is multi-level, open source and updated. … Cited by 3 – Related articles – View as HTML – All 2 versions
[PDF] AnCoraPipe: A tool for multilevel annotation [PDF] from sepln.org M Bertran, O Borrega, M Recasens… – … del lenguaje Natural, 2008 – sepln.org … Palabras clave: Lingüística de corpus, herramienta de anotación, niveles de anotación. Abstract: AnCoraPipe is a corpus annotation tool which allows different linguistic levels to be annotated simultaneously and efficiently, since it uses a single format for all stages. … Cited by 11 – Related articles – View as HTML – All 7 versions
[PDF] The UAM CorpusTool: Software for corpus annotation and exploration [PDF] from uam.es M O’Donnell – Proceedings of the XXVI Congreso de AESLA, 2008 – uam.es … These functionalities will be explored below. UAM CorpusTool is free, and works on Macintosh and Windows. It is perhaps the most user-friendly of the corpus annotation tools currently available, and is well- documented. This … Cited by 7 – Related articles – View as HTML – All 4 versions
Overview of the Ninth Annual Meeting of the BioLINK SIG at ISMB: Linking Literature, Information and Knowledge for Biology C Blaschke, L Hirschman, H Shatkay… – … , and Knowledge for …, 2010 – Springer … protein-protein interac- tions, analysis of experimental data and hypothesis generation, linking text to data- bases entries, text mining systems in support of specific users and applications, augmentation of the gene ontology using text mining, corpus annotation tools, and image … Related articles – All 2 versions
[PDF] Multiple purpose annotation using SLAT-Segment and link-based annotation tool– [PDF] from colorado.edu M Noguchi, K Miyoshi, T Tokunaga… – The Workshop …, 2008 – verbs.colorado.edu … http://verbs. colorado. edu/mpalmer/ projects/ace/PBguidelines. pdf Takahashi Tetsuro, Inui Kentaro.(2006). A multi-purpose corpus annotation tool: Tagrin. Proceedings of the 12th Annual Conference on Natural Language Processing. pp. 228-231. Yokohama. Japan. … Cited by 10 – Related articles – View as HTML – All 11 versions
[PDF] Slat 2.0: Corpus construction and annotation process management [PDF] from titech.ac.jp D Kaplan, R Iida… – … of the 16th Annual Meeting of …, 2010 – xxx.cl.cs.titech.ac.jp … Association for Compu- tational Linguistics. Tetsuro Takahashi and Kentaro Inui. 2006. A multi- purpose corpus annotation tool: Tagrin. Proc. of the 12th Annual Conference on Natural Language Process- ing, pages 228-231. – 513 – Cited by 1 – Related articles – All 3 versions
[PDF] The leeds arabic discourse treebank: Annotating discourse connectives for arabic [PDF] from lrec-conf.org A Al-Saif… – Language Resources and Evaluation …, 2010 – lrec-conf.org … A brief description of the annotation scheme follows in Sec- tion 4. The corpus, annotation tool, and the annotation methodology are discussed in Section 5. The results of our agreement studies and the gold standard corpus … Cited by 5 – Related articles – View as HTML – All 5 versions
[PDF] Annotating a historical corpus of German: A case study [PDF] from lrec-conf.org P Bennett, M Durrell, S Scheible… – Proceedings of the LREC …, 2010 – lrec-conf.org … corpus which can be used to carry out reliable corpus- linguistic studies of Early Modern German, we also plan to make the following contributions: • Provide detailed annotation guidelines for all proposed annotations • Test and evaluate current corpus annotation tools on gold … Cited by 1 – Related articles – View as HTML
[PDF] Annotation Process Management Revisited [PDF] from titech.ac.jp D Kaplan, R Iida, T Tokunaga – Proceedings of the …, 2010 – tanaka-www.cs.titech.ac.jp … Association for Compu- tational Linguistics. Tetsuro Takahashi and Kentaro Inui. 2006. A multi- purpose corpus annotation tool: Tagrin. Proc. of the 12th Annual Conference on Natural Language Process- ing, pages 228-231. 3661 Cited by 1 – Related articles – View as HTML – All 5 versions
Corpus annotation/management tools for the project: balanced corpus of contemporary written Japanese Y Matsumoto – Large-Scale Knowledge Resources. Construction and …, 2008 – Springer … Figure 1 summarizes the relationship between corpus annotation tools and annotated corpus maintenance tools. In the figure, Cradle is a database system 1 http://chasen.naist.jp/hiki/ChaKi/ Page 3. 108 Y. Matsumoto Large scale corpus (text data) M a ch ine Le a rning … Related articles – BL Direct – All 3 versions
[PDF] A corpus for cross-document co-reference [PDF] from unipi.it D Day, J Hitzeman, M Wick, K Crouch… – Proceedings of …, 2008 – mailserver.di.unipi.it … 2. Annotation Methods In order to create the cross-document co-reference corpus, we made use of the previously developed Callisto/EDNA annotation tool. This is a specialized annotation task plug-in for the Callisto corpus annotation tool (http://callisto.mitre.org/). … Cited by 2 – Related articles – View as HTML – All 13 versions
Cognos: a pragmatic annotation toolkit for the acquisition of natural interaction knowledge [PDF] from ua.es C Gómez, F Javier, E Albacete García… – 2011 – rua.ua.es … Palabras calve: Anotación pragmática, corpus de Interacción Natural, herramienta de Abstract: This paper describes some corpus annotation tools focused in pragmatic knowledge. … NOMOS (Niekrasz and Gruenstein, 2006) is also a multimodal corpus annotation tool. … Related articles
Cognos: A Pragmatic Annotation Toolkit for the Acquisition of Natural Interaction Knowledge [PDF] from sepln.org FJ Calle, E Albacete, G Olaziregi… – … de Lenguaje Natural, 2011 – journal.sepln.org … Palabras calve: Anotación pragmática, corpus de Interacción Natural, herramienta de Abstract: This paper describes some corpus annotation tools focused in pragmatic knowledge. … NOMOS (Niekrasz and Gruenstein, 2006) is also a multimodal corpus annotation tool. … Related articles – All 3 versions
[PDF] The Creagest Project: a Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages [PDF] from cis.gouv.fr A Balvet, C Courtin, D Boutet, C Cuxac… – Proceedings of the …, 2010 – cis.gouv.fr … Therefore, two development sub-tasks have been devised. Elan companion tools The ELAN platform14 is our main corpus annotation tool, the participants in this sub-project are therefore in close connection with the ELAN development group at Max Planck Institute. … Related articles – View as HTML – All 4 versions
[PDF] DutchSemCor: Building a semantically annotated corpus for Dutch [PDF] from trojina.si P Vossen, A Görög, F Laan, M van Gompel… – Proceedings of …, 2011 – trojina.si … they want to annotate. After selection, snippets are automatically tokenised, part-of-speech tagged and lemmatised using Frog3 and made available in the corpus annotation tool for assigning the sense. The final DutchSemCor … View as HTML
[HTML] Parenthetically Speaking: Classifying the Contents of Parentheses for Text Mining [HTML] from nih.gov KB Cohen, T Christiansen… – AMIA Annual Symposium …, 2011 – ncbi.nlm.nih.gov … Acknowledgments. Kristina Williams refined the categories and did the corpus annotation. Michael Bada assisted with the set-up of the corpus annotation tool. William A. Baumgartner Jr. and Chris Roeder helped clarify our thinking about Unicode. …
[DOC] Annotating a multigenre corpus of Early Modern German [DOC] from lancs.ac.uk P Bennet, M Durrel, S Scheible… – Corpus Linguistics 2009, 2009 – ucrel.lancs.ac.uk … This study addresses the challenges in annotating a spatialised multi-genre corpus of Early Modern German with linguistic information, and describes how the data can be used to carry out a systematic evaluation of state-of-the-art corpus annotation tools on historical data, with … Cited by 1 – Related articles – View as HTML
Ontology-based information extraction: An introduction and a survey of current approaches [PDF] from psu.edu DC Wimalasuriya, D Dou – Journal of Information Science, 2010 – jis.sagepub.com Page 1. Journal of Information Science, 36 (3) 2010, pp. 306-323 (c) The Author(s), DOI: 10.1177/0165551509360123 306 Ontology-based information extraction: An introduction and a survey of current approaches Daya C. Wimalasuriya and Dejing Dou … Cited by 20 – Related articles – All 11 versions
[PDF] The TDIL program and the Indian Language Corpora Initiative (ILCI) [PDF] from ffzg.hr GN Jha – Proceedings of the Seventh Conference on …, 2010 – hnk.ffzg.hr … On an average of 10 words in a sentence would mean a corpora of 60,000,00 annotated words Tools • Corpus annotation tool • KWIC identifier • Stemmer • Affix list builder • Frequency list builder • Named Entity lists builder … Cited by 6 – Related articles – View as HTML – All 3 versions
[PDF] Arabic anaphora resolution: Corpora annotation with coreferential links” [PDF] from ccis2k.org S Hammami, L Belguith… – The International Arab Journal …, 2009 – ccis2k.org … Our aim is to build a real corpus which will be used for anaphora resolution (ie, either for system training or evaluation). Keywords: Anaphora resolution, Arabic language, corpus annotation tool, pronominal anaphora, lexical anaphora. … Cited by 2 – Related articles – View as HTML – All 4 versions
[PDF] Tagging Amazigh with AnCoraPipe [PDF] from ua.es M Outahajala, L Zekouar, P Rosso… – Editors & Workshop …, 2010 – repository.dlsi.ua.es … wedding arrives”). 4. AnCoraPipe tool AnCoraPipe (Bertran et al. 2008) is a corpus annotation tool which allows different linguistic levels to be annotated efficiently, since it uses the same format for all stages. The tool reduces … Cited by 5 – Related articles – View as HTML – All 6 versions
[PDF] Building a Corpus-based Historical Portuguese Dictionary: Challenges and Opportunities [PDF] from atala.org AC Junior… – Traitement Automatique des Langues, 2009 – atala.org … and Jones, 2006). Besides, it is useless to apply corpus annotation tools trained on contemporary language data to historical texts, since they will not deal with the spelling variants of a word (Rayson et al., 2005). Whenever a … Cited by 2 – Related articles – View as HTML – All 3 versions
Corpora in translator training K Kunz, S Castagnoli… – Why translation studies matters, 2010 – books.google.com … However, other “open-ended” exercises in the form of assignments (eg building a corpus, solv- ing specific translation problems, applying a corpus annotation tool) requiring assess- ment and feedback by a tutor have also been devised for application to blended learn- ing … Related articles
[PDF] Building an annotated corpus for Amazighe [PDF] from ua.es M Outahajala, L Zenkouar… – Will appear in …, 2011 – repository.dlsi.ua.es … With this plugin, all features included in Eclipse are made available for corpus annotation and developing. AncoraPipe is a corpus annotation tool which allows different linguistic levels to be annotated efficiently by (Bertran et al. … Cited by 2 – View as HTML
[PDF] Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora [PDF] from lancs.ac.uk P Rayson, D Archer, A Baron, J Culpeper… – 2007 – comp.eprints.lancs.ac.uk … collocation statistics, and n-grams. However, our focus in this paper is investigating the problems caused for automatic corpus annotation tools and, in particular, part-of- speech taggers. 2. Background Many existing historical corpora … Cited by 20 – Related articles – View as HTML – All 9 versions
[PDF] Query-based Annotation and the Sumerian Verbal Prefixes [PDF] from toronto.edu EJM Smith – 2010 – csri.toronto.edu … 42 3.4 LPattern operators . . . . . 44 3.5 Functions of corpus/annotation tools (McEnery and Rayson, 1997) . . . . 48 3.6 Storageofqueryobjects . . . . . 49 3.7 Summaryofcurrentannotations . . . . . … Related articles – View as HTML – All 20 versions
Revisiting the impact of different annotation schemes on PCFG parsing: a grammatical dependency evaluation [PDF] from aclweb.org A Boyd… – Proceedings of the Workshop on Parsing German, 2008 – dl.acm.org … Around 30% of sentences in Negra contain at least one discontinuity. To remove discontinuities, we used the conversion program included with the Negra corpus annotation tools (Brants and Plaehn, 2000), the same tool used in Kübler et al. … Cited by 5 – Related articles – All 11 versions
Retrieving relatives from historical data M Hundt, D Denison… – Literary and Linguistic Computing, 2011 – ALLC … Recent developments in robust corpus annotation tools have made parsing of corpora much easier. This kind of syntactic annotation, in turn, makes retrieval of zero relatives a more realistic goal. Parser output has only been tested for Present-Day English data, so far. …
[BOOK] Phraseology in Corpus-Based Translation Studies M Ji – 2010 – books.google.com … annotation scheme, which through an exchange of two major procedures in corpus data mining, ie the linguistic annotation of the corpus and the automatic extraction of corpus data, could help keep in balance the developing nature of many corpus annotation tools and the … Cited by 2 – Related articles – Library Search – All 3 versions
[PDF] Extending the SiGML Notation-a Progress Report [PDF] from uea.ac.uk J Glauert… – 2011 – vhg.cmp.uea.ac.uk … of the features described here. 8. REFERENCES [1] R. Elliott, J. Bueno, R. Kennaway, and J. Glauert. Towards the integrationn of synthetic sl annimation with avatars into corpus annotation tools. In T. Hanke, editor, 4th Workshop … Related articles – View as HTML
[CITATION] A multi-purpose corpus annotation tool: Tagrin T Tetsuro… – Proceedings of the 12th Annual Conference on Natural …, 2006 Cited by 4 – Related articles
[PDF] A Flexible Annotation-Based Architecture for Intelligent Language Tutoring Systems [PDF] from uni-tuebingen.de R Ziai – 2009 – sfs.uni-tuebingen.de Page 1. Universität T ¨ubingen Seminar f ¨ur Sprachwissenschaft Wilhelmstraße 19 72074 T ¨ubingen MA Thesis in Computational Linguistics A Flexible Annotation-Based Architecture for Intelligent Language Tutoring Systems Ramon Ziai rziai@sfs.uni-tuebingen.de … Cited by 1 – Related articles – View as HTML
AnCoraPipe [PDF] from unirioja.es M Bertran Ibarz, O Borrega Cepa… – … del lenguaje natural, 2008 – dialnet.unirioja.es … Palabras clave: Lingüística de corpus, herramienta de anotación, niveles de anotación. Abstract: AnCoraPipe is a corpus annotation tool which allows different linguistic levels to be annotated simultaneously and efficiently, since it uses a single format for all stages. …
AnCoraPipe: a tool for multilevel annotation [PDF] from ua.es M Bertran Ibarz, O Borrega Cepa, M Recasens Potau… – 2008 – rua.ua.es … Palabras clave: Lingüística de corpus, herramienta de anotación, niveles de anotación. Abstract: AnCoraPipe is a corpus annotation tool which allows different linguistic levels to be annotated simultaneously and efficiently, since it uses a single format for all stages. … All 2 versions
[PDF] I. Toward the Common Platform of Digital Slavic Lexicographic Resources [PDF] from ijs.si T Erjavec… – Tomaž Erjavec, Jan Jona Javoršek, 2008 – nl.ijs.si … extension with up-loaded files 6. text statistics over uploaded corpora: keyness, terms 7. access management Corpus Annotation with Totale We propose for the first prototype processing pipeline to implement in the scope of the VO the corpus annotation tool “totale”(Erjavec et al … Related articles – View as HTML – All 4 versions
Natural language technology for information integration in business intelligence [PDF] from shef.ac.uk D Maynard, H Saggion, M Yankova… – Business Information …, 2007 – Springer … Finance, World Bank, CIA Fact Book) have been targeted in order to boost system accuracy. We rely on the Ontology-based Corpus Annotation Tool (OCAT), a GATE plugin which uses one or more ontologies for annotation of concepts/classes. … Cited by 10 – Related articles – BL Direct – All 8 versions
Developing tool for crosscutting concern identification using NLP BS Ali… – Information Technology, 2008. ITSim …, 2008 – ieeexplore.ieee.org … The first corpus annotation tool applied to the text is the hybrid POS tagger, CLAWS [11] which assigns a POS tag to every word in running text with about 97% accuracy. A second layer of annotation is applied by SEMTAG, a semantic tagger [12]. … Cited by 1 – Related articles
Unsupervised topic modelling for multi-party spoken discourse [PDF] from upenn.edu M Purver, TL Griffiths, KP Körding… – Proceedings of the 21st …, 2006 – dl.acm.org … We thank Elizabeth Shriberg and Andreas Stolcke for pro- viding automatic speech recognition data for the ICSI corpus and for their helpful advice; John Niekrasz and Alex Gruenstein for help with the NOMOS corpus annotation tool; and Michel Gal- ley for discussion of his … Cited by 56 – Related articles – BL Direct – All 43 versions
How (not) to select your voice corpus: random selection vs. phonologically balanced [PDF] from toshiba-europe.com T Lambert, N Braunschweiler… – Proceedings of the 6th …, 2007 – isca-speech.org Page 1. 264 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, August 22-24, 2007 How (Not) to Select Your Voice Corpus: Random Selection vs. PhonologicallyBalanced Tanya Lambert§, Norbert Braunschweiler‡, Sabine Buchholz ‡ … Cited by 5 – Related articles – All 4 versions
Automatic rule learning exploiting morphological features for named entity recognition in Turkish S Tatar… – Journal of Information Science, 2011 – jis.sagepub.com … The experiments were conducted on the TurkIE corpus, generated in support of this study. The developed corpus and the corpus annotation tool are two other major con- tributions of this study, which will encourage and support future researchers in this area. … Related articles – All 4 versions
Online and off-line visualization of meeting information and meeting support [PDF] from utwente.nl A Nijholt, R Rienks, J Zwiers… – The Visual Computer, 2006 – Springer … industrial de- signer. The question arises: will these teams ever agree on a design? 2.2 Corpus annotation and corpus annotation tools How does a development team agree on the design of a re- mote control? To answer such … Cited by 30 – Related articles – Library Search – BL Direct – All 21 versions
[PDF] Search-By-Example in Multilingual Sign Language Databases [PDF] from uea.ac.uk R Elliott, H Cooper, EJ Ong, J Glauert… – 2011 – vhg.cmp.uea.ac.uk … Technologies, Valetta, Malta, May17 – 23 2010. [2] R. Elliott, J. Bueno, R. Kennaway, and J. Glauert. Towards the integrationn of synthetic sl annimation with avatars into corpus annotation tools. In T. Hanke, editor, 4th Workshop … Cited by 1 – Related articles – View as HTML – All 2 versions
[PDF] Automatic detection of spelling variation in historical corpus: An application to build a Brazilian Portuguese spelling variants dictionary [PDF] from lancs.ac.uk R Giusti, A Candido Jr, M Muniz… – Proceedings of the …, 2007 – ucrel.lancs.ac.uk … Jones, 2006). Second, it is useless to apply corpus annotation tools trained on contemporary language data to historical texts, since they will not deal with spelling variants of the same word (Rayson et al., 2007). More recently … Cited by 4 – Related articles – View as HTML – All 2 versions
[BOOK] A glossary of corpus linguistics P Baker, A Hardie… – 2006 – books.google.com Page 1. A Glossary of Corpus Linguistics PAUL BAKER, ANDREW HARDIE & TONY MCENERY Page 2. A GLOSSARY OF CORPUS LINGUISTICS Page 3. TITLES IN THE SERIES INCLUDE Peter Trudgill A Glossary of Sociolinguistics … Cited by 45 – Related articles – Library Search – All 6 versions
Lexicographic Tools and Techniques. [PDF] from bg-openaire.eu L Dimitrova… – 2008 – bg-openaire.eu … The MULTEXT tools were implemented under UNIX. They could be distributed in two main types: corpus annotation tools and corpus exploitation tools – segmenter, morphological analyser, part-of-speech disambiguator, aligner, etc. … Related articles – All 2 versions
[PDF] On Compatibility of Slavic Language Resources [PDF] from ijs.si L Dimitrova… – Tomaž Erjavec, Jan Jona Javoršek, 2008 – nl.ijs.si … The MULTEXT tools were implemented under UNIX. They could be distributed in two main types: corpus annotation tools and corpus exploitation tools-segmenter, morphological analyser, part-of-speech disambiguator, aligner, etc. … Related articles – View as HTML – All 5 versions
Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search [PDF] from vng.com.vn V Ngo… – Advances in Intelligent Information and Database …, 2010 – Springer … For example, “was actress in” is mapped to actedIn, “is author of” is mapped to wrote, and “nationality is” is mapped to isCitizenOf. 3. Recognizing Entities: Entity recognition is implemented by OCAT (Ontol- ogy-based Corpus Annotation Tool) of GATE. … Cited by 1 – Related articles – All 3 versions
[PDF] Chunking-based question type identification for multi-sentence queries [PDF] from psu.edu M Takechi, T Tokunaga… – … of the SIGIR 2007 Workshop on …, 2007 – Citeseer … 8. ACKNOWLEDGMENTS I would like to thank Dr. Taku Kudo and Dr. Tetsuro Taka- hashi for providing us their useful free software of machine learning and corpus annotation tools. 9. REFERENCES [1] R. Iida, K. Inui, and Y. Matsumoto. … Cited by 6 – Related articles – View as HTML – All 10 versions
[PDF] Language resources for Uralic minority languages [PDF] from ua.es A Novák – … : interoperability between people in the creation of …, 2008 – repository.dlsi.ua.es … 6. A web based corpus annotation tool Although morphological analyzers can be used to rapidly analyse huge amounts of text, they cannot be used alone to create morphosyntactically annotated corpora, because there is always a great degree of morphological ambiguity in … Cited by 1 – Related articles – View as HTML – All 9 versions
[PDF] Matrix: A statistical method and software tool for linguistic analysis through corpus comparison [PDF] from lancs.ac.uk P Rayson – 2003 – eprints.comp.lancs.ac.uk Page 1. Matrix: A statistical method and software tool for linguistic analysis through corpus comparison A thesis submitted to Lancaster University for the degree of Ph.D. in Computer Science September 2002 Paul Edward Rayson B.Sc. Computing Department … Cited by 112 – Related articles – View as HTML – All 13 versions
Ontology Acquisition Process: A Framework for Experimenting with different NLP Techniques [PDF] from lancs.ac.uk R Gacitua, P Sawyer, S Piao… – … of the UK e-Science All …, 2007 – eprints.lancs.ac.uk … analysis and comparison. This tool provides a Web interface for syntactic and semantic corpus annotation tools, and implements standard corpus linguistic methodologies such as frequency lists and concordances. Our research … Related articles – All 12 versions
VoiceYourView: collecting real-time feedback on the design of public spaces [PDF] from lancs.ac.uk J Whittle, W Simm, MA Ferrario… – Proceedings of the 12th …, 2010 – dl.acm.org … Wmatrix is an NLP software tool for corpus analysis and comparison. It is built on the USAS and CLAWS corpus annotation tools, which tag each word in the text with its Part Of Speech (POS) and semantic category (SemTag). … Cited by 3 – Related articles – All 4 versions
Part-of-speech tagging and partial parsing for Irish using finite-state transducers and constraint grammar [PDF] from dcu.ie E Uí Dhonnchadha – 2009 – doras.dcu.ie Page 1. Part-of-Speech Tagging and Partial Parsing for Irish using Finite-State Transducers and Constraint Grammar A thesis submitted for the degree of Doctor of Philosophy Elaine Uí Dhonnchadha Dublin City University Supervisor: Prof. Josef Van Genabith December 2008 … Cited by 1 – Related articles – All 2 versions
[PDF] Interactive pedagogical programs based on constraint grammar [PDF] from sdu.dk L Antonsen, S Huhmarniemi… – Proceedings of the 17th …, 2009 – beta.visl.sdu.dk … Eckhard Bick. 2005. Live use of Corpus data and Corpus annotation tools in CALL: Some new devel- opments in VISL. Holmboe, Henrik (ed.): Nordic Language Technology, °Arbog for Nordisk Sprogtek- nologisk Forskningsprogram 2000-2004, 171-185. … Cited by 2 – Related articles – View as HTML – All 12 versions
[PDF] Generating web-based English preposition exercises from real-world texts [PDF] from purl.org V Metcalf… – Eurocall 2006 Granada. Integrating CALL into study …, 2006 – purl.org Page 1. The WERTi System Vanessa Metcalf and Detmar Meurers Introduction Pedagogical grounding The WERTi system Introduction WERTi and FLT practice Progression in WERTi Example 1: Pronouns Example 2: Passive … Cited by 6 – Related articles – View as HTML – All 8 versions
[PDF] MLAC10 [PDF] from usal.es RA Martín – 2010 – campus.usal.es Page 1. MLAC10 SALAMANCA 5-7 JULY 2010 PROVISIONAL LIST OF ABSTRACTS (participants in alphabetical order) Ideological features in the translation of alternative medicine texts Rasgos ideológicos en la traducción de textos de medicina alternativa … Related articles – View as HTML
BibTeX-Entry [PDF] from dagstuhl.de L Burnard, M Dobreva… – InProceedings {burnard_et_al: …, 2007 – drops.dagstuhl.de Page 1. 06491 Abstracts Collection Digital Historical Corpora – Architecture, Annotation, and Retrieval Dagstuhl Seminar Lou Burnard1, Milena Dobreva2, Norbert Fuhr3 and Anke Lüdeling4 1 Oxford Univ. Computing Services … All 2 versions
[PDF] Greek named entity recognition using support vector machines, maximum entropy and onetime [PDF] from univ-mlv.fr I Michailidis, K Diamantaras… – Proceedings of the 5th …, 2006 – igm.univ-mlv.fr … for all tokens of the two corpora (use of the Hellenic version of Brill’s PoS Tagger (Brill, 1992; Afantenos et al., 2002)) without manual correction • manually tag the Named Entities in the two corpora following CoNLL-2002 annotation guidelines (use of a corpus annotation tool) … Cited by 4 – Related articles – View as HTML – All 12 versions
[CITATION] Compilation and Structuring of a Spanish-Basque Parallel Corpus A Casillas, AD de Illarraza, J Igartua, R Martinez… – 5th SALTMIL Workshop on …, 2006 Cited by 2 – Related articles – All 2 versions
[PDF] ELERFED: Final Report [PDF] from uni-heidelberg.de M Poesio, D Day, R Artstein, J Duncan… – 2008 – cl.uni-heidelberg.de Page 1. ELERFED: Final Report Massimo Poesio, David Day Ron Artstein, Jason Duncan, Vladimir Eidelman, Claudio Giuliano, Rob Hall, Janet Hitzeman, Alan Jern, Mijail Kabadjov, Stanley Yong Wai Keong, Gideon Mann … Related articles – View as HTML – All 4 versions
Meetings in the virtuality continuum: Send your avatar [PDF] from utwente.nl A Nijholt – Cyberworlds, 2005. International Conference on, 2005 – ieeexplore.ieee.org … designer. How does such a development team agree about the design of a remote control? 2.3.Corpus annotation and corpus annotation tools How does a development team agree about the design of a remote control? In order … Cited by 9 – Related articles – All 15 versions
[PDF] Bulgarian MULTEXT-East Corpus-Structure and Content [PDF] from bas.bg L Dimitrova, R Pavlov, K Simov… – Cybernetics and …, 2005 – cit.iit.bas.bg … These tools have been implemented under UNIX. They could be distributed in two main types: corpus annotation tools and corpus exploitation tools – segmenter, morphological analyzer, part-of-speech disambiguator, aligner, etc. … Cited by 5 – Related articles – View as HTML – All 5 versions
[PDF] An Annotated Corpus Management Tool: ChaKi [PDF] from ffzg.hr Y Matsumoto, M Asahara, K Hashimoto… – Proc. 5th International …, 2006 – hnk.ffzg.hr … Annotated corpus Text Data (raw corpus) Corpus Annotation Tools (POS tagger, Dependency parser) / Manual annotation ChaKi Dictionary + … Annotated corpus Text Data (raw corpus) Corpus Annotation Tools (POS tagger, Dependency parser) / Manual annotation ChaKi … Cited by 1 – Related articles – View as HTML – All 7 versions
Efficient sentence retrieval based on syntactic structure [PDF] from upenn.edu I Hiroshi, H Keita, H Taiichi… – … of the COLING/ACL on Main …, 2006 – dl.acm.org Page 1. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 399-406, Sydney, July 2006. cO2006 Association for Computational Linguistics Efficient sentence retrieval based on syntactic structure … Cited by 2 – Related articles – All 17 versions
Roles and Reusability of Video Data in Social Studies of Interaction. SCARP Case Study No. 5 [PDF] from ed.ac.uk A Whyte – Digital Curation Centre, 2009 – era.lib.ed.ac.uk Page 1. Digital Curation Centre SCARP Project Case Studies ISSN 1759-586X SCARP B4.8.2.2 1 Roles and Reusability of Video Data in Social Studies of Interaction SCARP Case Study No. 5 Angus Whyte Digital Curation Centre, University of Edinburgh … Related articles – All 8 versions
[PDF] Constraint grammar in dialogue systems [PDF] from ut.ee L Antonsen, S Huhmarniemi… – … SERIES VOL. 8, 2009 – murre.ut.ee … USA. Eckhard Bick. 2005. Live use of Corpus data and Corpus annotation tools in CALL: Some new devel- opments in VISL. Holmboe, Henrik (ed.): Nordic Language Technology, Årbog for Nordisk Sprogtek- nologisk Forskningsprogram 2000-2004, 171-185. … Cited by 1 – Related articles – View as HTML – All 10 versions
[PDF] Resonance activation in interactional parliamentary discourse [PDF] from uio.no E Zima, G Brône, K Feyaerts… – Linearisation and Segmentation …, 2008 – hf.uio.no … and 1429, respectively). We manually corrected, enriched and annotated the extracted sequences by means of audio and video recordings, using the ELAN video corpus annotation tool (Wittenburg et al. 2006). In its simplest … Cited by 3 – Related articles – View as HTML – All 5 versions
[CITATION] Online and off-line visualization of meeting information and meeting support J Zwiers… – 2006 Related articles
[CITATION] Literature Review on Patient-Friendly Documentation Systems C Hallett, D Hardcastle, D Kokkinakis, C Mancini… – 2006 Related articles
Preparation and analysis of linguistic corpora N Ide – A companion to digital humanities, 2004 – Wiley Online Library … One of the best known such corpora is the London-Lund Corpus of Spoken English (Svartvik 1990). Corpus annotation tools Over the past decade, several projects have created tools to facilitate the annotation of linguistic corpora. … Cited by 9 – Related articles – All 3 versions
[PDF] Greek Named Entity Recognition using Support Vector Machines [PDF] from univ-mlv.fr I Michailidis, K Diamantaras… – … Conference on Greek …, 2005 – igm.univ-mlv.fr … without manual correction • obtain Part of Speech (PoS) tags for all tokens of the two corpora (use of the Hellenic version of Brill’s PoS Tagger [10][11]) without manual correction • manually tag the Named Entities in the two corpora (use of a corpus annotation tool) • create the … Cited by 2 – Related articles – View as HTML – All 6 versions
Toward Socially Intelligent Interviewing Systems NK Person, S D’Mello… – Envisioning the survey …, 2008 – Wiley Online Library … 199 to the UCREL semantic analysis system (USAS) and CLAWS word-tagging corpus annotation tools, and to standard corpus linguistic methodologies such as frequency lists, word concordances, and grammatical and semantic category parsing. … Related articles – All 3 versions
[PDF] New perspectives on corpus linguistics [PDF] from unirioja.es K Stuart – RAEL: revista electrónica de lingüística aplicada, 2005 – dialnet.unirioja.es … that may be intersentential. Level 4 enriches the text with the results of sentence-level linguistic analyses. Various corpus annotation tools (sentence segmenters, text tokenisation tools, morphological analysers, POS taggers, etc.) are available to make these tasks easier … Cited by 1 – Related articles – All 2 versions
Annotation of error types for a German newsgroup corpus [PDF] from psu.edu M Becker, A Bredenkamp, B Crysmann… – Treebanks. Building and …, 2003 – Springer … As mentioned above, two sophisticated annotation tools were evaluated: Annotate, the corpus annotation tool of the Negra project.’ (Skut et al., 1997; Brants et al., this volume) and DiET, the multi-purpose annotation tool developed within the DiET project”. … Cited by 4 – Related articles – All 7 versions
Are You Lying Now? A Linguistic Examination of Deceptive Utterances in Online Conversation [PDF] from cornell.edu B Amos – 2008 – ecommons.library.cornell.edu … 2008). This Natural Language Processing program is a software tool for corpus analysis and comparison, providing an interface to the USAS and CLAWS corpus annotation tools. In addition, it is able to report standard corpus … Related articles – All 5 versions
[PDF] Developing an automated semantic analysis system for Early Modern English [PDF] from lancs.ac.uk D Archer, T McEnery, P Rayson… – 2003 – comp.eprints.lancs.ac.uk … Consequently, we chose the second option. The initial stage of the experiment involved submitting a selection of the texts (ie 25) to WMATRIX (Rayson 2003), a web-based corpus processing environment, and applying the corpus annotation tools, CLAWS and SEMTAG. … Cited by 25 – Related articles – View as HTML – All 16 versions
Ontology Design for Video Semantic Threads [PDF] from uci.edu JR Kender… – … and Expo, 2005. ICME 2005. IEEE …, 2005 – ieeexplore.ieee.org Page 1. ONTOLOGY DESIGN FOR VIDEO SEMANTIC THREADS John R. Kender Department of Computer Science Columbia University New York, NY 10027 jrk@cs.columbia.edu Milind R. Naphade IBM TJ Watson Research … Related articles – All 3 versions
[PDF] Large scale experiments for semantic labeling of noun phrases in raw text [PDF] from psu.edu L Guthrie, R Basili, F Zanzotto, K Bontcheva… – Proceedings of the …, 2004 – Citeseer … 4.2. The Annotation Tool The corpus annotation tool takes pre-processed docu- ments, collected in a corpus, and provides the human anno- tators with an intuitive, fast interface, which enables them to annotate nouns by choosing from a list of valid semantic tags. … Cited by 2 – Related articles – View as HTML – All 16 versions
Changing the rules: A comparison of recent trends in English in academic scientific discourse and prescriptive legal discourse E Seoane… – Diachronic perspectives on domain- …, 2006 – books.google.com … Lancashire. Her main research relates to the discoursal practices of the (historical) English courtroom. Other interests include (historical) pragmatics more generally and also corpus annotation/tool development. Recent publications … Cited by 2 – Related articles – All 2 versions
[CITATION] An interface for annotating natural interactivity [PDF] from psu.edu NO Bernsen, L Dybkjær… – Current and New Directions in Discourse …, 2003 Cited by 7 – Related articles – All 10 versions
[PDF] Text Parsing for Sign Language Generation with Combinatory Categorial Grammar [PDF] from uea.ac.uk J Chung… – … on Sign Language Translation and Avatar …, 2011 – vhg.cmp.uea.ac.uk Page 1. Text Parsing for Sign Language Generation with Combinatory Categorial Grammar Jin-Woo Chung Computer Science Department KAIST Daejeon, South Korea jwchung@nlp.kaist.ac.kr Jong C. Park Computer Science … Cited by 1 – Related articles – View as HTML
[PDF] Creative Uses of Information Extracted from SMS Messages [PDF] from shef.ac.uk T Ogle – 2005 – dcs.shef.ac.uk … 4. Tools Related to Information Extraction 4.1 Adaptive Information Extraction Systems 12 4.1.1 Amilcare 4.1.2 LaSIE-II and GATE 4.1.3 FASTUS 4.1.4 TIES 4.2 Corpus Annotation Tools 13 4.2.1 Melita 4.2.2 Other Annotation Tools Page 6. … Cited by 1 – Related articles – View as HTML
[PDF] Literature Review on Patient-Friendly Documentation Systems [PDF] from open.ac.uk LB HansÅhlfeldt4, P Daumke, N Grabar, C Hallett… – 2006 – mcs.open.ac.uk … 66 8 Corpus Annotation Tools 67 8.1 Requirements . . . . . … 82 8.8 Survey of existing corpus annotation tools (Sweden) . . . . 82 9 Survey of Corpora of Patient Information 83 … Related articles – View as HTML – All 6 versions
[PDF] Literature Review on Patient-Friendly Documentation Systems [PDF] from psu.edu A Hans, L Borin, P Daumke, N Grabar, C Hallett… – 2006 – Citeseer … 66 8 Corpus Annotation Tools 67 8.1 Requirements . . . . . … 82 8.8 Survey of existing corpus annotation tools (Sweden) . . . . 82 9 Survey of Corpora of Patient Information 83 … Related articles – View as HTML – All 4 versions
[PDF] Corpora in Minor Languages of India: Some Issues [PDF] from psu.edu B Mallikarjun – 2004 – Citeseer … 6.4.2 Structural Tagging Tools Part-of-speech tagging, also called grammatical tagging, is the commonest form of corpora tagging. There exist many corpus annotation tools for English such as SARA, BNCWeb, WordSmith, etc. … Related articles – View as HTML – All 3 versions
Synthesizing mood-affected signed messages: modifications to the parametric synthesis F López-Colino… – International Journal of Human-Computer …, 2011 – Elsevier
[PDF] Corpus assisted development of a Hungarian morphological analyser and guesser [PDF] from lancs.ac.uk A Novák, V Nagy… – Proceedings of the Corpus …, 2003 – ucrel.lancs.ac.uk … analyser uses. 3 The symbolic guesser module No robust and wide coverage corpus annotation tool set can exist without some means of handling linguistic items uncovered by the initial knowledge of the system. Typical stochastic … Cited by 1 – Related articles – View as HTML – All 2 versions
[PDF] A corpus query tool for syntactically annotated corpora [PDF] from uzh.ch C Merz – 2003 – files.ifi.uzh.ch Page 1. LIZENTIATSARBEIT DER PHILOSOPHISCHEN FAKULT ¨AT DER UNIVERSIT ¨AT Z¨URICH ACORPUS QUERY TOOL FOR SYNTACTICALLY ANNOTATED CORPORA Author: Charlotte Merz Supervisor: PD Dr. Martin Volk May 2003 Page 2. Contents 1 Introduction 1 … Related articles – View as HTML – Library Search – All 7 versions
German clause-embedding predicates: an extraction and classification approach [PDF] from uni-stuttgart.de E Lapshinova-Koltunski – 2011 – elib.uni-stuttgart.de … 111 4.15 Subcategorisation “inheritance” types . . . . . 112 5.1 Corpora used in the study . . . . . 116 5.2 Corpus annotation tools . . . . . 117 5.3 Token annotation . . . . . … Related articles – All 3 versions
[PDF] Rhetorical analysis with rich-feature support vector models [PDF] from psu.edu D Reitter – Unpublished Master’s thesis, University of Potsdam, …, 2003 – Citeseer Page 1. Diploma Thesis in Computational Linguistics Rhetorical Analysis with Rich-Feature Support Vector Models David Reitter February 2003 Professor Manfred Stede Supervisor Professor Deb Roy Reader University of Potsdam Page 2. Contents 1. Introduction 5 1.1. … Cited by 13 – Related articles – View as HTML – All 4 versions
[PDF] Corpora and Linguistic Knowledge (or: a rationalist perspective on corpora) [PDF] from ohio-state.edu WD Meurers – 2002 – ling.ohio-state.edu … Course web page: http://ling.osu.edu/~dm/2002/spring/795K/ • Corpora and corpus annotation tools page: http://ling.osu.edu/~dickinso/corpus. html 2http://www.coli.uni-sb.de/sfb378/negra- corpus/ 3http://www.ims.uni-stuttgart.de/projekte/TIGER/ 5 Page 6. … Related articles – View as HTML – All 9 versions
[PDF] Intelligent Computer-Assisted Language Learning: Implementation to Swahili [PDF] from helsinki.fi A Hurskainen – 2009 – njas.helsinki.fi Page 1. Technical Reports in Language Technology Report No 3, 2009 http://www.njas.helsinki.fi/salama 1 Intelligent Computer-Assisted Language Learning: Implementation to Swahili Arvi Hurskainen Institute for Asian and … Cited by 1 – Related articles – View as HTML