Sentence Splitter 2011

Resources:

See also:

Sentence Extractor | Sentence Grammaticality | Sentence Parsers & Dialog Systems | Sentence Recognizer


SBVR Business Rules Generation from Natural Language Specification I Bajwa, M Lee… – AAAI 2011 Spring Symposium-AI for Business …, 2011 – aaai.org … elements. The core system processes a text into three main processing stages: a. Lexical Processing The lexical processor comprises for sub-modules: a tokenizer, a sentence splitter, POS tagger, and a morphological analyzer. The … Cited by 3 – Related articles – All 3 versions

[PDF] Computational Linguistics Tools Exploited for Automatic Threat Recognition [PDF] from lukassikorski.de L Sikorski, B Haarmann… – … of the NATO RTE IST-099. …, 2011 – lukassikorski.de … In general, an IE processing pipeline consists of at least the following processing modules: tokenizer, gazetteer, sentence splitter, part-of-speech tagger, recognizer for named entities, and a parser. … First, sentences are determined by the sentence splitter. … Cited by 2 – Related articles – View as HTML – All 5 versions

[PDF] Applied Text Mining for Military Intelligence Necessities [PDF] from lukassikorski.de B Haarmann, L Sikorski… – 6th Future Security-Security …, 2011 – lukassikorski.de … Our IE pipeline consists of the following main IE processing modules: • Document Handling • Tokenizer • Gazetteer • Sentence Splitter • Part-of-speech Tagger • Named Entity Recognizer • Chunker 1.1 Document Handling … 1.4 Sentence Splitter … Cited by 1 – Related articles – View as HTML – All 3 versions

[PDF] Personal Translator at WMT2011-A rule-based MTsystem with hybrid components [PDF] from aclweb.org V Aleksic… – 2011 – aclweb.org … Page 2. html, pdf, doc, txt and rtf formats, sentence splitter, tokeniser, lemmatizer. As Personal Translator is a commercial system, aiming at providing a complete translator work bench and creating added value for users, it integrates a wide range of advanced features such as: … Cited by 1 – Related articles – View as HTML – All 8 versions

[PDF] IPL at ImageCLEF 2011 Medical Retrieval Task [PDF] from aueb.gr Y Gkoufas, A Morou… – Working Notes of CLEF, 2011 – ipl.cs.aueb.gr … To enrich the collec- tion with additional information we extracted from each article all the sentences with a reference to a specific image. To extract those textual references we used the sentence-splitter provided by LingPipe1 Project and the HTML Parser of Jsoup2. … Cited by 1 – Related articles – View as HTML – All 2 versions

Investigating the Statistical Properties of User-Generated Documents [PDF] from rero.ch G Inches, M Carman… – Flexible Query Answering Systems, 2011 – Springer … 4.3 Part-Of-Speech Distribution In the third part of our work we employ GATE and its built-in tokenizer, sentence splitter and Part-Of-Speech (POS) analyser called ANNIE [22,23] to analyse the Part-Of-Speech (POS) tags distribution in the different datasets. We report in Fig. … Cited by 1

[PS] University of She eld TREC-8 Q & A system [PS] from shef.ac.uk K Humphreysa, R Gaizauskasa, M Hepplea… – dcs.shef.ac.uk … with appropriate name categories. Sentence Splitter Identies sentence boundaries in the text body. Brill Tagger 1] Assigns one of the 48 Penn TreeBank part-of-speech tags to each token in the text. Tagged Morph Simple morphological … Cited by 1 – Related articles – View as HTML – All 4 versions

An analysis of gene/protein associations at PubMed scale [HTML] from jbiomedsem.com S Pyysalo, T Ohta… – Journal of Biomedical Semantics, 2011 – jbiomedsem.com … Sentence splitting and constituency syntax not shown. All PubMed documents in the TPS corpus were initially processed with the GENIA sentence splitter with simple heuristic post-processing to correct some errors from the machine learning-based splitter [19]. … Cited by 1 – Related articles – Cached

Terminological resources for text mining over biomedical scientific literature F Rinaldi, K Kaljurand… – Artificial Intelligence in Medicine, 2011 – Elsevier … We use the LingPipe tokenizer and sentence splitter which have been trained on biomedical corpora. … 24. The LingPipe part-of-speech tagger has been trained on biomedical texts and the sentence splitter and tokenizer are also aware of the nature of biomedical texts. … Cited by 1 – Related articles – All 4 versions

[HTML] Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to Process Medication Information in Outpatient Clinical Notes [HTML] from nih.gov L Zhou, JM Plasek, LM Mahoney… – AMIA Annual …, 2011 – ncbi.nlm.nih.gov … Pre-Processor The sentence splitter uses a set of rules mainly based on punctuations and carriage return (CR). … The Sentence Splitter connects back if the next sentence starts with a lower case and the current sentence’s last character is not “., ?, or !”. … Cited by 1

[PDF] Dependency-Based Rules for Grammar Checking with LanguageTool [PDF] from eucip.pl M Mozgovoy – fedcsis.eucip.pl … Several syntactic elements are backed with additional lin- guistic modules – sentence splitter and part-of-speech tag- ger. Sentence splitter determines the boundaries of each sen- tence, thus allowing the user to find certain … Related articles – View as HTML – All 2 versions

[PDF] Grammar Checking with Dependency Parsing: A Possible Extension for LanguageTool [PDF] from informatica.si M Mozgovoy – informatica.si … Several syntactic elements are backed with additional linguistic modules – sentence splitter and part-of-speech tagger. Sentence splitter determines the boundaries of each sentence, thus allowing the user to find certain tokens … View as HTML

A layered approach for enabling context-sensitive content M Dinsoreanu, B Neacsa, C Cupse… – … and Processing (ICCP …, 2011 – ieeexplore.ieee.org … Following the work from [3], we use a HTML parser to extract text from a webpage, next the content goes through a stop word removal and stemming process, followed by a sentence splitter. … We used OpenNLP sentence splitter [14] to divide our content into various sentences. … Related articles

[PDF] Heterogeneous Natural Language Processing Tools via Language Processing Chains [PDF] from bas.bg D Karagiozov – Student Research Workshop, 2011 – lml.bas.bg … 4 Language Processing Chain In order to achieve a basic set of low-level text annotations the following atomic NLP tools have to be executed in sequence (Cristea and Pistol, 2008): Paragraph splitter (splits the raw text in paragraphs)? Sentence splitter (splits each paragraph … Related articles – View as HTML

E recruiting support system based on text mining methods WKB Abdessalem… – International Journal of Knowledge …, 2011 – Inderscience … The ANNIE information extraction tool consists of components including tokeniser, gazetteer (system of lexicons), PostTagger, sentence splitter, named entity transducer, and OrthoMatcher. … The sentences are identified using annotations generated from the Sentence Splitter. …

Enhancing Search Results with Semantic Annotation Using Augmented Browsing [PDF] from unsw.edu.au HJ Dai, WC Tsai, RTH Tsai… – Twenty-Second International Joint …, 2011 – aaai.org … Our system dynamically enriches the PubMed search results displayed in a user?s browser with semantic annotation provided by several natY ural language processing (NLP) subsystems, inY cluding a sentence splitter, a partYofYspeech tagger, a named entity recognizer, a … All 4 versions

[PDF] Feature Extraction and Author Profile Creation from Documents in Malayalam [PDF] from ijart.org C TT II… – ijart.org … Extraction of these class of features need tools like tokenizer, sentence splitter, stemmer, … But they need very accurate NLP tools like sentence splitter, POS tagger, Parser, syntax checker etc. to perform syntactic analysis of texts. … Related articles – View as HTML

System and method for processing insurance claims MR Hail… – US Patent 7,966,204, 2011 – Google Patents … Page 3. US Patent Jun. 21, 2011 Sheet 2 015 US 7,966,204 B1 FIGURE 3 Text Extractor Word parser I ~ 302 W Sentence splitter IA 202 i 7 i I I Grammatical parser 306 II A I dictionary 308 7 Data tables 310 Page 4. US Patent Jun. … Related articles – All 2 versions

[PDF] Cengage Learning at TREC 2011 Medical Track [PDF] from 59.64.138.216 B King, L Wang… – 59.64.138.216 … We built our final pipeline on this framework, though many of the stages in the GATE pipeline-the tokenizer, the sentence splitter, and the part-of-speech tagger-were modified specifically to perform more accurately in the domain of medical reports. … View as HTML

Building a Rule-Based Malay Text Segmentation Tool B Ranaivo-Malançon – 2011 International Conference …, 2011 – doi.ieeecomputersociety.org … The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts. Malay tokeniser; Malay sentence splitter; text segmentation … In this paper, we present a method of building a sentence splitter and tokeniser for Malay, an alphabetic language. …

[PDF] ANovel MODEL FOR TIMED EVENT EXTRACTION AND TEMPORAL REASONING IN LEGAL TEXT DOCUMENTS [PDF] from psu.edu K Ramakrishna, V Guda, BP Rani… – International Journal of Computer … – Citeseer … 2.1. Natural Language Processing The NLP system functional components includes: 1. Sentence Splitter. 2. Tokenizer. 3. Shallow parser. … Sentence Splitter first marks individual sentences in the text by using a set of heuristic rules for detecting the end of the sentences. … Related articles – View as HTML – All 4 versions

An ontology-driven search module for accessing chronic pathology literature S Kiefer, J Rauch, R Albertoni, M Attene… – On the Move to …, 2011 – Springer … converts the imported documents, which might be in differ- ent formats, to normalized plain text format; Natural Language Processing (NLP) Tool is based on the GATE Framework [6], allows to specify processing pipelines (eg consisting of Sentence Splitter, Tokeniser, Part-of …

Detecting economic events using a semantics-based pipeline [PDF] from eur.nl A Hogenboom, F Hogenboom, F Frasincar… – Database and Expert …, 2011 – Springer … By default, GATE loads the A Nearly-New Information Extraction (ANNIE) system, which consists of several key components, ie, the English Tokenizer, Sentence Splitter, Part-Of-Speech (POS) Tagger, Gazetteer, Named Entity (NE) Transducer, and OrthoMatcher. … Related articles – All 3 versions

Semantic annotation of requirements for automatic UML class diagram generation [PDF] from arxiv.org S Amdouni, WBA Karaa… – Arxiv preprint arXiv:1107.3297, 2011 – arxiv.org … 1). It contains Tokeniser, Gazetteer (system of lexicons), Pos Tagger, Sentence Splitter, Named Entity Transducer, and OrthoMatcher. … In fact, it uses GATE API and especially the following components: sentence splitter, pos tagger, gazetteer, named entity transducer. … Related articles – All 6 versions

Detection of web users’ opinion from multimodal opinion elements KM Kumar – Proceedings of the Fourth Annual ACM Bangalore …, 2011 – dl.acm.org … words. Finally our data set 5 contained nearly 50 opinionated texts with opinion phrases, emoticons and short words as opinion elements. In our approach we pass an opinionated text to a sentence splitter program. The sentences … Related articles

[PDF] From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0 [PDF] from dcu.ie J Foster, O Cetinoglu, J Wagner, J Le Roux, J Nivre… – computing.dcu.ie … Tweets with more than one non-ASCII character were removed, and the remaining tweets were passed through our in-house sentence splitter and to- keniser, resulting in a corpus of 1,401,533 sen- tences. We refer to this as the TwitterTrain corpus. … Related articles – View as HTML – All 8 versions

[PDF] Extracting Information Science concepts based on Jape Regular Expression [PDF] from wvu.edu A Sawsaa… – Proceedings of the International Conference on …, 2011 – cerc.wvu.edu … (JAPE transducer). Orthomatcher (co-references), NP and VP chanker. Among these modules we used: Tokenisor, Sentence splitter, Gazatteer, JAPE transducer [5]. 3 Methods The process followed the method is based on creating documents-corpora and Gazetteer of … Related articles – View as HTML – All 3 versions

[PDF] Guiding CLASSY Toward More Responsive Summaries [PDF] from umd.edu JM Conroy, JD Schlesinger, PA Rankel… – cs.umd.edu … for this year’s evaluation. Sentence Splitting and Quotation Marks The sentence splitter we introduced last year has been renamed FASST-E (very Fast, very Accurate Sentence Splitter for Text-English). This splitter is routinely … Related articles – View as HTML – All 2 versions

[PDF] Text Analysis beyond Keyword Spotting [PDF] from bastianhaarmann.de B Haarmann, L Sikorski… – bastianhaarmann.de … one’s needs. The standard IE processing pipeline consists at least of the following processing modules: tokenizer, gazetteer, sentence splitter, part-of-speech tagger, … belong together according to linguistic theory. First, sentences are determined by the sentence splitter. … Related articles – View as HTML – All 5 versions

ATLAS Multilingual Language Processing Platform [PDF] from sepln.org M Ogrodniczuk… – Procesamiento de Lenguaje …, 2011 – journal.sepln.org … 243 Page 4. Table 2: NLP tools in ATLAS Tool type Tool name / Source Paragraph splitter Regexp-based solution by Tetracom Sentence splitter OpenNLP Tokenizer OpenNLP Lemmatizer RASP POS tagger OpenNLP WSD LESK-based2 … Sentence splitter 0,13 3,7% … Related articles – All 3 versions

Using Natural Language Tool to Assist VPRG Automated Extraction from Textual Vulnerability Description HT Le… – Advanced Information Networking and …, 2011 – ieeexplore.ieee.org … etc.), sentence splitter (or text segmentation), stopword analyzer, Part-of-Speech (POS) tagger, morphological analyzer, POS-Tag mapper, Noun chunker, feature analyzer (eg Frequencey analyzer). … Tokeniser Sentence Splitter Stopword Analyzer … Related articles – All 4 versions

[PDF] FRENCH TEXT PREPROCESSING WITH TTL [PDF] from acad.ro A TODIRASCU, ION Radu, M NAVLEA… – acad.ro … 2007). TTL is a collection of interconnected text preprocessing modules (sentence splitter, tokenizer, tagger, lemmatizer and chunker) with resources for Romanian and English but with no resources available for French. We … Cited by 1 – Related articles – View as HTML

Integration of Natural Language Processing Chains in Content Management Systems [PDF] from atlasproject.eu D Karagiozov – Third International Conference on Software, Services …, 2011 – Springer … A sample LPC consist of the following atomic NLP tools [6]: Tokenizer (splits the raw text into tokens) ? Paragraph splitter (splits the text in paragraphs) ? Sentence splitter (splits the paragraphs in sentences) ? POS tagger (marks up each token with its particular part of speech … Related articles – All 6 versions

Ontology Construction Using Computational Linguistics for E-Learning L Jegatha Deborah, R Baskaran… – … : Sustaining Research and …, 2011 – Springer … The contents are posted as a text document. This raw document has to be pre-processed before ontology construction. The statements in the text document are split as individual sentences using Sentence Splitter Tagger which analyzes the stop points using the parser [22]. … Related articles

[PDF] POLUKR (A POLISH-UKRAINIAN PARALLEL CORPUS) AS A TESTBED FOR A PARALLEL CORPORA TOOLBOX [PDF] from domeczek.pl N KOTSYBA – domeczek.pl … sentence splitter, a grammatical dictionary and its editor, aligner with the possibility of … morphosyntactic dictionaries. 6 A rule-based sentence-splitter written in Python by Oresta Tymchyshyn was used for the PolUKR project. … Related articles – View as HTML – All 2 versions

[PDF] SMEE: a tool to extract sorting motif data from PubMed Central abstracts and full text documents [PDF] from sc.edu L Cawthorne – Proceedings of the 49th Annual Southeast Regional …, 2011 – reu.cse.sc.edu … analysis. As a final step, which is currently performed separately, the GENIA sentence splitter is ran with the text as input. … tagging. The Natural Language Toolkit (NLTK) also contains a fine sentence splitter and part-of-speech tagger. … Related articles – View as HTML – All 2 versions

[PDF] Open-source, high performance biomedical term extraction and concept mapping [PDF] from city.ac.uk P Gooch… – vega.soi.city.ac.uk … We used GATE’s Sentence Splitter to identify sentences, and combined GATE’s Tokenizer and Part of Speech Tagger with Java Annotation Patterns Engine (JAPE) rules to identify NP and PP chunks. For example, a simple noun-phrase can be identified using the following rule: … Related articles – View as HTML

Towards a user-friendly webservice architecture for statistical machine translation in the PANACEA project [PDF] from dcu.ie A Toral, P Pecina, A Way… – 2011 – doras.dcu.ie … Each side of the corpus is preprocessed with a sentence splitter (europarl sentence splitter) and a tokeniser (eu- roparl tokeniser). Then the corpus is sentence- aligned using Hunalign. Finally, the outputs of the alignment and the tokenisers are converted to the … Cited by 2 – All 5 versions

Enhanced Anaphora Resolution Algorithm Facilitating Ontology Construction LJ Deborah, V Karthika, R Baskaran… – Advances in Computing …, 2011 – Springer … Procedure: Begin do { // Step 1: Sentence Splitter // Step 2: Resolving typed dependencies among the raw sentences While (end of statement) { Assign Identifier Number for each sentence Describe the grammatical relationships in a sentence among words (nsubj, … Related articles

[PS] Linguistic Processors and Infrastructure [PS] from upc.edu B Magnini, L Bentivogli, A Lavelli, IALEHUJA Batalla… – lsi.upc.edu … Linguistic Processors and Infrastructure Page : 5 Sentence splitter. Morphological analyzer. … Tokenizer Tokenizer UPC UPC TokenPro UPC Sussex LEX-Tokenizer CL-Tokenizer Sentence Splitter Splitter Splitter Splitter Morphological Lemati maco+ Sussex MorphoPro maco+ … Related articles – View as HTML

Systems and methods for generating a decision network from text PJ Talbot, DR Ellis – US Patent 7,917,460, 2011 – Google Patents … Page 4. US Patent Mar. 29, 2011 Sheet 3 of 6 US 7,917,460 B2 TEXT SEGMENT i 112 TOKEN GENERATOR I 114 SENTENCE SPLITTER JL 116 PART OF SPEECH CLASSIFIER 104 WORD LISTS I 118 102 ENTITY CATEGORIZATION ZIJJE2!…. … Related articles – All 4 versions

[PDF] Guide for Users and Developers [PDF] from semanticsoftware.info R Witte, N Naderi… – 2011 – semanticsoftware.info … (standard GATE component). Abbreviation Gazetteer: Identifies the abbreviated form of species, volume and number (see Chapter 7). Sentence splitter: Splits the text into sentences according to punctuation (standard GATE component). … Related articles – View as HTML – All 2 versions

[PDF] Semantic Text Mining for Lignocellulose Research [PDF] from semanticsoftware.info MJ Meurs, C Murphy, I Morgenstern, N Naderi… – 2011 – semanticsoftware.info … types. Finally, the ANNIE Sentence Splitter segments the text into sentences by means of a cascade of finite-state transducers and the Hepple part-of-speech tagger that is included with GATE adds POS tags to each token. … Cited by 1 – Related articles – View as HTML

Visualizing Domain Ontology using Enhanced Anaphora Resolution Algorithm [PDF] from arxiv.org LJ Deborah, R Baskaran… – Arxiv preprint arXiv:1109.2321, 2011 – arxiv.org … Domain: Text Corpus Input: Any web input text corpus Output: List of Anaphors found Procedure: Begin do { // Step 1: Sentence Splitter // Step 2: Resolving typed dependencies among the raw sentences While (end of statement) { Assign Identifier Number for each sentence … Related articles – All 5 versions

[PS] Inventory of Linguistic Processors [PS] from upc.edu A Lavelli… – lsi.upc.edu … Version: Final Inventory of Linguistic Processors Page : 12 3.3 Splitter Type: Sentence Splitter Author: UPC Description: Splits a stream of tokens into sentences Languages: Spanish, Catalan, English Portability: Easy. Requires sentence marked training corpus. … Related articles – View as HTML

[PDF] Natural language processing and e-government: extracting reusable crime report information [PDF] from claremont.edu GA Leroy… – scholarship.claremont.edu … This resource splits the text in the witness narrative into tokens such as numbers, punctuation, and words. – Sentence Splitter. The sentence splitter segments the narrative text into sentences. This is a necessary step for future part-of-speech identification. … Related articles

[PDF] A tool for enhancing MetaMap performance when annotating clinical guideline documents with UMLS concepts [PDF] from city.ac.uk P Gooch… – vega.soi.city.ac.uk … candidate terms. We used GATE’s Sentence Splitter to identify sen- tences, and combined GATE’s Tokenizer and Part of Speech Tagger with Java Annotation Patterns Engine (JAPE) rules to identify NP, PP and VP chunks. For … Related articles – View as HTML

Integrating DBpedia and SentiWordNet for a tourism recommender system B Varga… – Intelligent Computer Communication and …, 2011 – ieeexplore.ieee.org … Based on these lists, annotations of type Lookup are created for each matching string in the text. This type contains information like Date type, Person names, Places and others. The Sentence Splitter module is used for segmenting the original text into sentences. … Related articles

[PDF] BEwT-E in the TAC 2010 AESOP Task [PDF] from nist.gov S Tratz – nist.gov … summarization tasks. It is also possible that some particular component that BEwT-E relies on, such as the sentence splitter or named entity recognizer, failed to perform well due to some oddities in the data. Conceivably, there … Related articles – View as HTML

[PDF] Comparing the Use of Edited and Unedited Text in Parser Self-Training [PDF] from aclweb.org J Foster, O Cetinoglu, J Wagner… – Conference on Parsing …, 2011 – aclweb.org … quarter of 2010. The content was stripped of HTML markup and passed through an in-house sentence splitter and tokeniser, resulting in a corpus of 1009646 sentences. We call this the FootballTrainDiscussion corpus. Edited Text … Related articles – View as HTML – All 7 versions

[PDF] APPROCHES FOR QUESTION ANSWERING SYSTEMS [PDF] from ijest.info V GUDA, SK SANAMPUDI… – ijest.info … Some of the IR Based systems like AskJeeves, LaSiE system performs text analysis which uses some basic modules like Tokenizer, Sentence splitter, Parse process, Name matcher, Discourse Interpreter [Robert Gaizauskas,(1998)]. … Related articles – View as HTML – All 2 versions

The CHRONIOUS Ontology-Driven Search Tool: Enabling Access to Focused and Up-to-Date Healthcare Literature [PDF] from arxiv.org S Kiefer, J Rauch, R Albertoni, M Attene… – Arxiv preprint arXiv: …, 2011 – arxiv.org … A Transformation Module converts the imported documents to normalized plain text to be processed by the Natural Language Processing (NLP) Tool, which allows to specify processing pipelines (eg consisting of Sentence Splitter, Tokeniser, Part-of-speech Tagger and … Related articles – All 4 versions

[PDF] Opinion Mapping Travelblogs [PDF] from semanticweb.org E Drymonas, A Efentakis… – iswc2011.semanticweb.org … part of analysis. To this task, our processing pipeline comprises of a set of four modules: (i) the ANNIE tokenizer, (i) the (ANNIE) Sentence Splitter, (iii) the ANNIE POS Tagger and (iv) the WordNet Lemmatiser. The intermediate … Related articles – View as HTML – All 6 versions

Acquiring entailment pairs across languages and domains: A data analysis [PDF] from pascal-network.org M Faruqui, S Padó – 2011 – eprints.pascal-network.org … RCV2 contains over 487,000 news stories in 13 different languages. Almost all news stories cover the business and politics domains. The corpus marks the title of each article; we used the sentence splitter provided by Treetagger (Schmid, 1995) to extract the first sentences. … Related articles – All 8 versions

Semantic Search based on the Online Integration of NLP Techniques K Masuda, T Matsuzaki – Procedia-Social and Behavioral Sciences, 2011 – Elsevier … Procedia – Social and Behavioral Sciences 27 ( 2011 ) 281 — 290 In addition to the above mentioned modules, we used low-level NLP modules like a sentence splitter and POS tagger as well as a more high-level processing module like a sentence rhetorical role tagger [13]. …

MERLIN (Metadata Enrichment for Repositories in a London Institutional Network). Final project report. [PDF] from ucl.ac.uk M Moyle – 2011 – discovery.ucl.ac.uk … NaCTeM. The plain text files created in the MERLIN environment by the two preceding stages are passed to the NacTeM sentence splitter, which employs heuristic rules for identifying the boundaries of sentences and paragraphs. …

[PDF] Extracting Noun Phrases in Subject and Object Roles for Exploring Text Semantics [PDF] from enggjournals.com A Thomas, MK Kowar, S Sharma… – International …, 2011 – enggjournals.com … NLP tools, among which OpenNLP appeared as a group of open source projects based on Maximum Entropy library called as Sharp Entropy, being a C# port of Java library [3]. These Open NLP tools supported by Java MaxEnt library, include a sentence splitter, a tokenizer, a … Related articles – View as HTML – All 3 versions

Personalizing web search using long term browsing history [PDF] from cuhk.edu.hk N Matthijs… – Proceedings of the fourth ACM international …, 2011 – dl.acm.org … The outcome of this algorithm run on two sample web pages can be seen in Table 2. Noun Phrases Noun phrases were extracted by taking the text from each web page and splitting it into sentences using a sentence splitter from the OpenNLP Tools3. … Cited by 8 – Related articles – All 7 versions

Semantic search in the World News domain using automatically extracted metadata files L Kallipolitis, V Karpis… – Knowledge-Based Systems, 2011 – Elsevier … 3.3.1 GATE integration and extensions ANNIE consists of a pipeline of components that perform grammatical and syntactical parsing: English Tokenizer, Onto Gazetteer, Sentence Splitter, Part Of 10 Speech (POS) Tagger and Minipar Parser. …

Semiautomatic domain model building from text-data P Šaloun, Z Velart… – Semantic Media Adaptation …, 2011 – ieeexplore.ieee.org … Stanford Named Entity Recognizer. All these tools are included in the package CoreNLP Stanford, which among the three tools also includes a basic tokenizer, sentence splitter and co-reference resolver. The latter activity (resolving …

Developing tools and resources for the biomedical domain of the Greek language A Vagelatos, E Mantzari, M Pantazara… – Health Informatics …, 2011 – jhi.sagepub.com … The annotation process of the IATROLEXI corpus involves almost all NLP components, either adopted or constructed, within the framework of IATROLEXI: a tokenizer, a sentence splitter, a morpho- syntactic tagger, a biomedical gazetteer, a multi-word term recognizer, and an … Related articles – All 2 versions

[PDF] Keyword Extraction for Contextual Advertising [PDF] from ezcom.cn J LIU, C WANG… – ezcom.cn … The preprocessor first parses an HTML document, and returns blocks of text in the body, title information in the header. Because a keyword should not cross sentence boundaries, we apply a sentence splitter to separate text in the same block into various sentences. … Related articles – View as HTML – All 2 versions

Morphological analysis of a non-standard language variety [TXT] from utlib.ee HJ Kaalep… – 2011 – dspace.utlib.ee … The sentences are not annotated. A sentence splitter developed for the standard written Estonian fails to find sentence boundaries in the chatroom texts, as a typical sentence there does not begin with a capital letter nor end with a punctuation mark. … Related articles – All 2 versions

Generating Semantics for the Life Sciences via Text Analytics E Buyko… – Semantic Computing (ICSC), 2011 Fifth …, 2011 – ieeexplore.ieee.org … and the dedicated event extractor. The JREX pre-processor uses a series of text analytics tools such as sentence splitter, tokenizer, POS tagger, chunker, all retrained on the GENIA corpus,5 and parser. The input data is further … Cited by 1 – Related articles – All 2 versions

A virtual telehealth framework: Applications and technical considerations [PDF] from bham.ac.uk S Kareem… – Emerging Technologies (ICET), 2011 …, 2011 – ieeexplore.ieee.org … Used NLP steps to analyze text are [15]: 1) Lexical Analysis: The input English is passed through a sentence splitter and then each input sentence is handed over to Stanford POS tagger to tokenize the input English text and identify the particular parts of speech of each token. … Related articles – All 3 versions

Sentiment Classification from Online Customer Reviews Using Lexical Contextual Sentence Structure [PDF] from utp.edu.my K Aurangzeb, B Baharum… – 2011 – eprints.utp.edu.my … sentence. Fig. 1. Proposed Architecture for Sentiment Analysis 3.1 Sentence Splitter and Processing Noisy Text In this section the reviews are spitted into sentences to extract feature level sentiment score by SentiWordNet. Making … Related articles

DODT: Increasing requirements formalism using domain ontologies for improved embedded systems development S Farfeleder, T Moser, A Krall… – … and Diagnostics of …, 2011 – ieeexplore.ieee.org … resents a single sentence. NL requirements which consist of several sentences are split into several requirements. The tool uses the GATE sentence splitter to detect sentence beginnings and ends. If more than one sentence … Cited by 1 – Related articles

[PDF] e-Research for Linguists [PDF] from ffzg.hr D Beermann… – ACL HLT 2011, 2011 – hnk.ffzg.hr … to standardisation. 2.1 Interlinear Glossing Online After having imported a text into the Editor which is easily accessed from the site’s navigation bar (New text), the text is run through a simple, but efficient sentence splitter. The … Related articles – View as HTML – All 12 versions

[PDF] “I Thou Thee, Thou Traitor”: Predicting Formal vs. Informal Address in English Literature [PDF] from aclweb.org M Faruqui, S Padó – Proceedings of the 49th Annual Meeting of the …, 2011 – aclweb.org … The files were then formatted to contain one sentence per line and a blank line was inserted to preserve the segmentation information. The sentence splitter and tokenizer provided with EUROPARL (Koehn, 2005) were used. … Related articles – View as HTML – All 9 versions

Patent claim decomposition for improved information extraction [HTML] from pragsem.org P Parapatics… – Current Challenges in Patent Information …, 2011 – Springer … In some claims, elements of an invention are enumerated in a form such as “a.” or “b.”. Since a period (“.”) occurring in this context is interpreted as a sentence delimiter by GATE’s sentence-splitter these constructs lead to erroneous decompo- sition of claims and are therefore … Cited by 2 – Related articles – All 4 versions

[PDF] A Multilingual Text Normalization Approach [PDF] from univ-aix.fr B Bigi – aune.lpl.univ-aix.fr … It is a package comprising 4 tokenizers: White Space, Regex, Break Iterator and Sentence Tokenizer. Papageorgiou et al. (2000) discuss a regex-based tokenizer and sentence splitter that contains a list of abbreviations for Greek texts. Martínez et al. … Related articles – View as HTML

Event extraction for dna methylation [HTML] from jbiomedsem.com T Ohta, S Pyysalo, M Miwa… – Journal of Biomedical …, 2011 – jbiomedsem.com … for the corpus. For sentence splitting, we applied the GENIA sentence splitter [35], and for gene/protein tagging, we applied the BANNER NER system [33] trained on GENETAG [34] (as for document filtering). The GENETAG … Cited by 4 – Related articles – Cached

D2S: Document-to-sentence framework for novelty detection [PDF] from ipn.mx FS Tsai… – Knowledge and information systems, 2011 – Springer Page 1. Knowl Inf Syst DOI 10.1007/s10115-010-0372-2 REGULAR PAPER D2S: Document-to-sentence framework for novelty detection Flora S. Tsai · Yi Zhang Received: 18 June 2009 / Revised: 22 July 2010 / Accepted: 11 … Cited by 6 – Related articles – All 4 versions

[PDF] A New Method to Retrieve, Cluster And Annotate Clinical Literature Related To Electronic Health Records [PDF] from google.com I Fernandez, A Jimenez-Castellanos… – LOUHI 2011 Third … – sites.google.com … For that purpose it uses a GATE3 application which com- bines a tokenizer, sentence splitter, part of speech tagger, stemmer and finally an ontology based gazetteer tagger, and tags all the expressions in the text re- ferring to any term in the ontology. … Related articles – View as HTML

[PDF] Enhancing Relevant Region Classifying [PDF] from diva-portal.org T Karlsson… – 2011 – kth.diva-portal.org Page 1. Royal Institute of Technology Master thesis Enhancing Relevant Region Classifying Author: Thomas Karlsson FOI Supervisor: Edward Tjörnhammar March 20, 2011 Page 2. Page 3. Abstract In this thesis we present a new way of extracting relevant data from texts. … Related articles – View as HTML

[HTML] Using UMLS Lexical Resources to Disambiguate Abbreviations in Clinical Text [HTML] from nih.gov Y Kim, J Hurdle… – AMIA Annual Symposium …, 2011 – ncbi.nlm.nih.gov … preprocessing. We use a sentence splitter adapted from openNLP and using the cTAKES 13 trained model, and a simple tokenizer. The abbreviation annotator uses pattern matching and LRABR to annotate all abbreviations in our text. …

[PDF] Automatic Metadata Generation in an Archaeological Digital Library: Semantic Annotation of Grey Literature [PDF] from glam.ac.uk A Vlachidis, C Binding, D Tudhope… – hypermedia.research.glam.ac.uk … The architecture makes available a range of lan- guage processing resources, such as the Tokenizer, Sentence Splitter and Part-of-Speech tagger, as part of the default ap- plication ANNIE (A Newly New Information Extraction System). … Related articles – View as HTML

[PDF] Automatic Metadata Generation for Semantic Indexing in an Archaeological Digital Library via the CIDOC CRM via the CIDOC CRM [PDF] from glam.ac.uk A Vlachidis, C Binding, D Tudhope… – hypermedia.research.glam.ac.uk … The architecture makes available a range of lan- guage processing resources, such as the Tokenizer, Sentence Splitter and Part-of-Speech tagger, as part of the default ap- plication ANNIE (A Newly New Information Extraction System). … Related articles – View as HTML – All 2 versions

[PDF] D2. 3 Scaling Strategy [PDF] from project-first.eu M Radzimski, M Kalender, M Grcar, M Brakus… – project-first.eu Page 1. Project Acronym: FIRST Project Title: Large scale information extraction and integration infrastructure for supporting financial decision making Project Number: 257928 Instrument: STREP Thematic Priority: ICT-2009-4.3 Information and Communication Technology … Related articles – View as HTML – All 5 versions

[PDF] Semantic Analysis of Military Relevant Texts for Intelligence Purposes [PDF] from dodccrp.org M Hecking… – en.dodccrp.org … Figure 4 shows some of the annotations that result from structural analysis. The text is tokenized and sentence splitting is done (see token and sentence annotations in Figure 4). A tokenizer and a sentence splitter are provided by GATE software. … View as HTML

LEXA: Towards Automatic Legal Citation Classification F Galgani… – AI 2010: Advances in Artificial Intelligence, 2011 – Springer … We use the Tokenizer, Sentence Splitter, Part of Speech Tagger and Stemmer resources (provided with GATE) to generate Token anno- tations and their corresponding features for input texts, building the first layer of linguistic annotations. … Related articles

Semantic Annotation Semantically: Using a Shareable Extraction Ontology and a Reasoner [PDF] from thinkmind.org J Dedek, P Vojtas – … 2011, The Fifth International Conference on …, 2011 – thinkmind.org … and English. We are using a majority of applicable tools from TectoMT: a tokeniser, a sentence splitter, morpho- logical analyzers (including POS tagger), a syntactic parser and the deep syntactic (tectogrammatical) parser. All …

Pattern and keyword based opinion analysis from opinionated texts KM Kumar – Proceedings of the International Conference & …, 2011 – dl.acm.org … Our approach of opinion detection is accomplished using two phases. In phase1 , we pass an opinionated text to sentence splitter program. The sentences obtained from the program were input to a part of speech tagger. The tagger used in our approach is Monty Tagger [14]. … Related articles

[HTML] Evaluating Measures of Redundancy in Clinical Texts [HTML] from nih.gov R Zhang, S Pakhomov, BT McInnes… – AMIA Annual …, 2011 – ncbi.nlm.nih.gov … remain most important. After each complete patient record was split into individual clinical notes, each note was further separated into small chunks of text at a sentence/statement level using a rule-based sentence splitter. Due to …

[PDF] THE EXTRACTION OF PHARMACOGENETIC AND PHARMACOGENOMIC RELATIONS-A CASE STUDY USING PHARMGKB [PDF] from stanford.edu E BUYKO, E BEISSWANGER… – psb.stanford.edu … 45.7% recall and 51.0% F-score (,2122), and thus considerably narrowed the gap to the winner of the BioNLP’09 Shared Task who scored at 51.95% F-score.g As far as pre-processing is concerned, JReX uses JCore tools24 such as JulieLab’s sentence splitter and tokenizer. … Related articles – View as HTML

Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux 2 Stephen Hogan 3, Joakim Nivre 4, Deirdre Hogan and Josef van Genabith [PDF] from cngl.ie S Hogan – 2011 – aaai.org … Tweets with more than one non-ASCII character were removed, and the re- maining tweets were passed through our automatic sentence splitter and tokeniser, resulting in a corpus of 1,401,533 sen- tences. We refer to this as the TwitterTrain corpus. … Related articles – All 3 versions

[PDF] An Approach to Automatic Music Band Member Detection Based on Supervised Learning [PDF] from jku.at P Knees – cp.jku.at … the subsequent PRs. Typically, a GATE pipeline consists of the following PRs: 1. Tokenizer: splits the text into tokens based on white spaces. 2. Sentence Splitter: splits the text into sentences based on punctuation. 3. Part of … View as HTML

[PDF] Crowdsourcing syntactic relatedness judgements for opinion mining in the study of information technology adoption [PDF] from aclweb.org AB Sayeed, B Rusk, M Petrov, HC Nguyen… – ACL HLT 2011, 2011 – aclweb.org … vector. We then selected all the sentences that contain IT concept mentions from the entire Information Week corpus using an OpenNLP 1.4. 3 model as our sentence-splitter. This produced approximately 77K sentences. Every … Cited by 1 – Related articles – View as HTML – All 15 versions

[PDF] PROCEDURES FOR THE TRANSLATION OF BOUNDARY DESCRIPTION TEXTS INTO GEOGRAPHIC LOCATION FORMAT [PDF] from itc.nl MBJEM MBANO – 2011 – itc.nl Page 1. PROCEDURES FOR THE TRANSLATION OF BOUNDARY DESCRIPTION TEXTS INTO GEOGRAPHIC LOCATION FORMAT MASIDA BENNETTE JEM MBANO February, 2011 SUPERVISORS: Prof. Dr. Ir. Martien Molenaar Prof. Dr. Menno – Jan Kraak Page 2. … Related articles

[PDF] Information Extraction and Opinion Organization for an e-Legislation Framework for the Philippine Senate [PDF] from hltd.org A Borra, CCREO Roxas… – hltd.org … resolution, tem- plate filler and evaluation. Under the preproces- sor, there are 6 submodules: tokenizer, sentence splitter, cross-reference, part of speech tagger, unknown word and named entity recognition. In a nutshell, a document … Related articles – View as HTML – All 2 versions

[PDF] SentiProfiler: Creating Comparable Visual Profiles of Sentimental Content in Texts [PDF] from aclweb.org T Kakkonen… – Language Technologies for Digital …, 2011 – aclweb.org … document. SA is performed in SentiProfiler with a GATE pipeline that consists of three basic ANNIE components (sentence splitter, word tokenizer and POS tagger), GATE morphological analyzer and an ontology-based tagging tool. … View as HTML

Towards Automatic Pathway Generation from Biological Full-Text Publications E Buyko, J Linde, S Priebe… – Advances in Intelligent Data …, 2011 – Springer … Page 7. Towards Automatic Pathway Generation 73 Fig. 1. Trimming of dependency graphs 4.1 JReX Pre-processor As far as pre-processing is concerned, JReX uses the JCore tool suite [7], eg, the JulieLab sentence splitter and tokenizer. … Related articles – All 2 versions

Towards using web-crawled data for domain adaptation in statistical machine translation [PDF] from dcu.ie P Pecina, A Toral, A Way, V Papavassiliou… – 2011 – doras.dcu.ie … In each paragraph pair we applied the following steps: identification of sen- tence boundaries by the Europarl sentence splitter, tokenization by the Europarl tokenizer, and sen- tence alignment by Hunalign,10 a widely used tool for automatic identification of parallel sentences … Cited by 1 – Related articles – All 4 versions

Distribution of “Characteristic” Terms in MEDLINE Literatures [PDF] from mdpi.com NR Smalheiser, W Zhou… – Information, 2011 – mdpi.com … regardless of how many times the term occurred within the same abstract.) For each occurrence of a term within a MEDLINE record, we noted its location within title, abstract, or last sentence in abstract. Sentence boundaries were identified using the Sentence Splitter [10]. … Related articles – All 3 versions

[PDF] Building a Named Entity Recognizer in Three Days: Application to Disease Name Recognition in Bulgarian Epicrises [PDF] from nus.edu.sg GD Georgiev, V Zhikov, B Popov… – sterling.ddns.comp.nus.edu.sg … Among its features that have helped us the most were (i) its ability to extract text from Microsoft Word documents, (ii) its default Uni- code tokenizer and (iii) its sentence splitter based on simple regular expressions, which we were able to adopt very quickly, thus overcoming the … Related articles – View as HTML – All 2 versions

[PDF] Sentence First CAPTCHA [PDF] from diva-portal.org P Johansson… – 2011 – hig.diva-portal.org Page 1. Beteckning:_____ Akademin för teknik och miljö Sentence First CAPTCHA Proposal and study of a text based CAPTCHA scheme Patrik Johansson & Robert Östlund June 2011 Bachelor Thesis, 15 hp, C Computer Science … Related articles – View as HTML

[PDF] Promoting Interoperability of Resources in META-SHARE [PDF] from ijcnlp2011.org P Thompson, Y Kano, J McNaught… – … and Services in the …, 2011 – ijcnlp2011.org … An example of a possible workflow for carrying out named entity recognition is the following: Sentence Splitter? Tokeniser? POS Tagger? Syntactic Parser? Named Entity Recogniser In combining resources together, it is only necessary to ensure that the types of annotation … View as HTML

[PDF] Mining Millions of Reviews: A Technique to Rank Products Based on Importance of Reviews [PDF] from icec11.org K Zhang, Y Cheng, W Liao… – 2011 – icec11.org … factor. Note that ß and d have different values when calculating the age weights for products from different categories. Sentence Splitter and Part-Of-Speech Tagging A customer review typically consists of several sentences. It … Related articles – View as HTML – All 4 versions

BioNOT: A searchable database of biomedical negated sentences [PDF] from biomedcentral.com S Agarwal, H Yu… – BMC bioinformatics, 2011 – biomedcentral.com … sentences) and (3) full-text of articles published by Elsevier publisher (˜ 1.9 million articles; ˜ 215 million sentences). We split articles for sentences using the NaCTeM sentence splitter [14]. Using NegScope to detect scope of negation … Related articles – All 4 versions

Automatic Discovery of Complementary Learning Resources V Romero Zaldivar, R Crespo García… – Towards Ubiquitous …, 2011 – Springer … Gate includes a set of algorithms for natural language processing called “ANNIE” (A Nearly New IE System) consisting of tools such a tokenizer, a sentence splitter or a Part-of-speech (POS) tagger, etc. (see [8] for a more detailed description). … Related articles

[PDF] Tapta: A user-driven translation system for patent documents based on domain-aware Statistical Machine Translation [PDF] from mt-archive.info B Pouliquen, C Mazenc… – mt-archive.info … Our sentence splitter relies on sentence boundaries and list of abbreviations. When the sentence is too long, our tool splits also on segment boundaries (ie comma, semi-column, reference etc.), see Table 1 for an example. … Related articles – View as HTML

[PDF] iPlag: Intelligent Plagiarism Reasoner in Scientific Publications [PDF] from softcomputing.net S Alzahrani, N Salim, A Abraham… – sip04.softcomputing.net … captions, equations, figure captions, footnotes, etc. Structural components are further segmented into sentences using a sentence splitter, and all texts are tokenised (ie divided into terms). As a result, scientific publications are … View as HTML

[PDF] Lost in specialised translation: the corpus as an inexpensive and under-exploited aid for language service providers [PDF] from mt-archive.info GC Pastor – mt-archive.info … The Corpus Manager allows to upload, update, inspect or remove monolingual and parallel corpora; whereas the Task Manager includes utilities to process the corpora, such as pre-processing tools (sentence splitter, tokeniser and external part-of-speech tagger and shallow … View as HTML

[PDF] Ripple Down Rules for Question Analysis [PDF] from vilangtek.com NQ Dat – 2011 – vilangtek.com … The ANNIE’s tokeniser separates the text into simple tokens such as numbers, punctuation and words according to different types. • The sentence splitter segments the text into sentences. • The POS tagger produces a part-of-speech tag as an annotation on each word Related articles – View as HTML

Semantic Analysis of Military Relevant Texts for Intelligence Purposes M Hecking… – 2011 – stormingmedia.us … Figure 4 shows some of the annotations that result from structural analysis. The text is tokenized and sentence splitting is done (see token and sentence annotations in Figure 4). A tokenizer and a sentence splitter are provided by GATE software. … Related articles

Semantic Analysis of Military Relevant Texts for Intelligence Purposes [PDF] from dtic.mil S Noubours – 2011 – DTIC Document … Figure 4 shows some of the annotations that result from structural analysis. The text is tokenized and sentence splitting is done (see token and sentence annotations in Figure 4). A tokenizer and a sentence splitter are provided by GATE software. … Related articles – All 5 versions

Advances in deep parsing of scholarly paper content U Schäfer… – Advanced Language Technologies for Digital …, 2011 – Springer … It can be ignored for the time being because the NLP tools used also do not understand mathematics. After text extraction, a sentence splitter segments into sentence units in order to provide suitable input for subsequent NLP. … Cited by 2 – Related articles – All 3 versions

[PDF] Sentiment Classification Using Sentencelevel Lexical Based Semantic Orientation of Online Reviews [PDF] from docsdrive.com A Khan, B Baharudin… – vol, 2011 – docsdrive.com … 1: Proposed Architecture for sentiment analysis Table 1: POS Tags Pos-id POS_naIne POS_abbrivation SentiWordNet_abrv 1 Noun NN n 2 Adjective JJ a 3 Verb VB v 4 Adverb RB r 5 Nouns NNS n 6 Adjectives JJ S a 7 Verbs VB Z v Sentence splitter and processing noisy text … Cited by 3 – Related articles – All 3 versions

Semi-automatic ESOL error annotation ØE Andersen – English Profile Journal, 2011 – Cambridge Univ Press … This partly corrected version of the text is then passed through RASP’s sentence splitter and tokeniser, providing input to the second annotator, annotator 2, which detects morphological Page 11. SEMI-AUTOMATIC ESOL ERROR ANNOTATION Page11of17 …

[PDF] Building a Project Memory Using Semantic Design Rationale Process [PDF] from scirp.org S Gueraich… – Journal of Software Engineering, 2011 – scirp.org … Associating grammatical labelling to each token such noun, verb, adjective. In our architecture, we integrate two modules to per- form the first two stages which are the Sentence-Splitter and the Tokenizer and we use the grammatical Tree- Tagger for labelling. … Related articles – View as HTML – All 3 versions

[HTML] Using electronic patient records to discover disease correlations and stratify patient cohorts [HTML] from plos.org FS Roque, PB Jensen, H Schmock, M Dalgaard… – PLoS Comput …, 2011 – dx.plos.org PLoS Computational Biology is an open-access. Cited by 2 – Related articles – Cached – All 9 versions

[PS] SENSEVAL Word-Sense Disambiguation Using a Different Sense Inventory and Mapping to WordNet [PS] from clres.com KC Litkowski – clres.com Page 1. SENSEVAL Word-Sense Disambiguation Using a Different Sense Inventory and Mapping to WordNet Kenneth C. Litkowski CL Research 9208 Gue Road Damascus, MD 20872 ken@clres.com Abstract In SENSEVAL … Related articles – View as HTML

[PDF] Fast and robust joint models for biomedical event extraction [PDF] from aclweb.org S Riedel… – Proceedings of the Conference on Empirical …, 2011 – aclweb.org Page 1. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1-12, Edinburgh, Scotland, UK, July 27-31, 2011. c 2011 Association for Computational Linguistics Fast and Robust Joint Models for Biomedical Event Extraction … Cited by 2 – Related articles – View as HTML – All 5 versions

Mining Gene-centric Relationships from Literature to Support Drug Discovery L Tari, J Küntzer, J Patel, Y Li… – 2011 IEEE …, 2011 – doi.ieeecomputersociety.org … large 1 JULIE lab sentence splitter and tokenizer: http://www.julielab.de/Resources/ Software/NLP_Tools.html 2 CNI is an internal effort within Roche for the compilation of an ontology for chemical compounds. 641 Page 4. number …

[PDF] Inductive Logic Programming in an Agent System for Ontological Relation Extraction [PDF] from ijmlc.org MDS Seneviratne… – ijmlc.org … process. Linguistic processing and pattern matching rules are used in GATE for information extraction. ANNIE is bundled with language processing tools Sentence Splitter, Tokenizer and Part of Speech Tagger. Those tools … Related articles – View as HTML

[PDF] Entity Identification & Co-reference Resolution [PDF] from shef.ac.uk A JAYAPAL… – 2011 – dcs.shef.ac.uk Page 1. THE UNIVERSITY OF SHEFFIELD DISSERTATION – FINAL REPORT Entity Identification & Co-reference Resolution IN THE AUTOMATIC KNOWLEDGE BASE POPULATION – SLOT-FILLING TASK Author(s): Arun JAYAPAL Supervisor(s): Prof. Rob GAIZAUSKAS … View as HTML

SVM based extraction of spatial relations in text X Zhang, C Zhang, C Du… – Spatial Data Mining and …, 2011 – ieeexplore.ieee.org … In this section, we first provide the overview of the framework of the extraction of spatial relations (Fig. 1). Then, we explain the details for each module in the framework. POS tagging, Sentence splitter GeoCorpus Place name recognition Evaluation model Annotation set … Related articles

Adaptations based on ontology evolution as a mean to exploit collective intelligence L Burzagli, F Gabbanini… – Universal Access in Human- …, 2011 – Springer … Text processing pipeline Document Clean up POS Tagger Sentence Tokenizer Sentence Splitter JAPE Transducer(s) Text document Annotated document … Fig. 3. An example text processing pipeline Page 8. 334 L. Burzagli, F. Gabbanini, and PL Emiliani … Related articles – All 2 versions

Extracção de informação de relatórios médicos [PDF] from ua.pt L Ferreira, A Teixeira… – Electrónica e Telecomunicações, 2011 – portal.doc.ua.pt … O ANNIE é então uma família de recursos de processa- mento para análise lingüística, tais como, Tokeniser, Ga- zetteer. Sentence Splitter. Part-of-Speecfi Tagger, Semantic Tagger. Orthographic Conference (OrthoMatcher) e Pro- nominal Conference. … Related articles – All 2 versions

Automatic discovery of complementary learning resources VAR Zaldivar, RMC García, D Burgos… – … Learning, Ec-tel …, 2011 – books.google.com … Gate includes a set of algorithms for natural language processing called “ANNIE”(A Nearly New IE System) consisting of tools such a tokenizer, a sentence splitter or a Part-of-speech (POS) tagger, etc.(see [8] for a more detailed description). … Related articles – All 2 versions

Parsed use case descriptions as a basis for object-oriented class model generation M Elbendak, P Vickers… – Journal of Systems and Software, 2011 – Elsevier … As will be explained in more detail later, the parsing process as a whole involves a tokenizer as a preliminary stage, a sentence splitter, a part-of-speech tagger and chunking followed by the parser itself. … Sentence Splitting: The sentence splitter identifies sentence boundaries. … Related articles – All 3 versions

Using domain ontologies for finding experts in corporate wikis R Schäfermeier… – … of the 7th International Conference on …, 2011 – dl.acm.org … trust Figure 2: Growth of Trust in a User Contribution over Time Expert Model Domain Model Text Analysis Components Wiki API Concept Matcher POS Tagger Sentence Splitter / Tokenizer Stemmer Wiki Markup Parser Similarity Metrics Authorship Component Wiki System …

Information Extraction and Semantic Annotation for Multi-Paradigm Information Management H Cunningham, V Tablan, I Roberts… – Current Challenges in …, 2011 – Springer … Annotates Roman numerals which are used for detecting references Numbers in Words Recognises numbers written as words and converts them to actual values Tokeniser Pattern matcher for detection of words and other lexical items Sentence splitter Regular expression … Cited by 1 – Related articles – All 2 versions

3 Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation KBH Cunningham – Springer Page 1. 3 Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation Kalina Bontcheva . Hamish Cunningham University of Sheffield, Sheffield, UK 3.1 Scientific and Technical Overview . . . . . … Related articles

EXTRACTING BIO-MOLECULAR EVENTS FROM LITERATURE-THE BIONLP’09 SHARED TASK JD Kim, T Ohta, S Pyysalo… – Computational …, 2011 – Wiley Online Library … 2005), and a version of the C&C CCG deep parser 9 adapted to biomedical text (Rimell and Clark 2009). Prior to parsing, the text of all documents was segmented using the GENIA Sentence Splitter and tokenized using the GENIA Tagger, both provided by U-Compare. … Cited by 2

[PDF] estado de la investigación [PDF] from 132.248.242.3 A Gelbukh – Memoria del I Simposio Internacional sobre … – 132.248.242.3 … recuperación de la información Page 356. 344 so no trivial de esta tarea. Un programa que efectúe esta tarea se llama en inglés sentence splitter. Eliminación de las palabras basura (stopwords en inglés). Para ciertas aplicaciones, las … Related articles – View as HTML

An event-centric model for multilingual document similarity [PDF] from 202.113.25.19 J Strötgen, M Gertz… – … of the 34th international ACM SIGIR …, 2011 – dl.acm.org Page 1. An Event-centric Model for Multilingual Document Similarity Jannik Strötgen Institute of Computer Science Heidelberg University Heidelberg, Germany stroetgen@uni-hd.de Michael Gertz Institute of Computer Science … Related articles – All 3 versions

[PDF] ASSESSING THE QUALITY FACTORS FOUND IN IN-LINE DOCUMENTATION WRITTEN IN NATURAL [PDF] from semanticsoftware.info N Khamis – 2011 – semanticsoftware.info … 102 C.1 JAPE Transducer. . . . . 102 C.2 Tokenizer . . . . . 102 C.3 Sentence Splitter . . . . . 102 C.4 Part-Of-Speech Tagger . . . . . 103 C.5 Gazetteer . . . . . … Related articles – View as HTML – All 2 versions

A semantic graph-based approach to biomedical summarisation L Plaza, A Díaz… – Artificial Intelligence in Medicine, 2011 – Elsevier … irrelevant sentences. • Finally, the text in the body section is split into sentences using the tokenizer, part of speech tagger and sentence splitter modules of the GATE architecture for text engineering [51] . The preprocessing … Cited by 1 – Related articles – All 4 versions

Spoken Question Answering S Rosset, O Galibert… – Spoken Language …, 2011 – Wiley Online Library Page 1. 6 Spoken Question Answering Sophie Rosset1, Olivier Galibert1,2 and Lori Lamel1 1 CNRS-LIMSI, France 2 LNE, France This chapter covers Question-Answering (QA) from spoken documents (referred to as QAst), but also beyond, where questions are also spoken. … Related articles

A crime reports analysis system to identify related crimes CH Ku… – Journal of the American Society for …, 2011 – Wiley Online Library … Our sentence is tokenized as follows: I / saw / a / guy / who / wears / a / deep / blue / jacket / trying / to / rob / the / bank / • The sentence splitter is a finite-state transducer that segments the text into sentences. There is only one sentence in our example. … Related articles – All 4 versions

[PDF] Automated Extraction of Protein Mutation Impacts from the Biomedical Literature [PDF] from concordia.ca N Naderi – 2011 – spectrum.library.concordia.ca Page 1. AUTOMATED EXTRACTION OF PROTEIN MUTATION IMPACTS FROM THE BIOMEDICAL LITERATURE NONA NADERI A THESIS IN THE DEPARTMENT OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING … View as HTML

[PDF] TUD Palladian Overview [PDF] from 141.76.40.242 D Urbansky, K Muthmann, P Katz… – 2011 – 141.76.40.242 Page 1. TUD Palladian Overview David Urbansky, Klemens Muthmann, Philipp Katz, Sandro Reichert TU Dresden, Department of Systems Engineering, Chair Computer Networks, IIR Group, Germany October 5, 2011 Page 2. 2 Overview of TUD Palladian Page 3. Contents … Cited by 1 – Related articles – View as HTML – All 7 versions

Mapping geospatial events based on extracted spatial information from web documents [PDF] from uiowa.edu NR Rock – 2011 – ir.uiowa.edu Page 1. University of Iowa Iowa Research Online Theses and Dissertations 2011 Mapping geospatial events based on extracted spatial information from web documents Nathaniel Robert Rock University of Iowa This dissertation … Related articles – All 2 versions

[PDF] Tutorial on Statistics, Probability and Information Theory for Language Engineers [PDF] from esole-eg.org IF Imam – esole-eg.org Page 1. Prof. Ibrahim F. Imam Full Professor and Assistant Dean, College of Computing and Information Technology Arab Academy for Science, Technology & Maritime Transport, Cairo Tutorial on Statistics, Probability and Information Theory for Language Engineers … View as HTML

[HTML] Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network [HTML] from genomebiology.com J Hur, Z Xiang, EL Feldman… – BMC immunology, 2011 – genomebiology.com … Pathogen-related literature is collected through PubMed. The titles and abstracts of the retrieved documents are pre-processed by a sentence splitter, and analyzed by the VO-SciMiner to identify vaccine VO terms and pathogen gene names. … Related articles – Cached – All 11 versions

[PDF] Content Management and Knowledge Management: Two Faces of Ontology-based Deep-Level Interpretation of Text [PDF] from tu-harburg.de ISE Peraldi – sts.tu-harburg.de Page 1. Content Management and Knowledge Management: Two Faces of Ontology-based Deep-Level Interpretation of Text Vom Promotionsausschuss der Technischen Universität Hamburg-Harburg zur Erlangung des akademischen Grades … Related articles – View as HTML – All 2 versions

[PDF] Basic Design of the architecture and methodologies (second round) [PDF] from upc.edu G Rigau, B Magnini… – lsi.upc.edu … Identifier Tokenizer Tokenizer UPC UPC TokenPro UPC Sussex LEX-Tokenizer CL-Tokenizer Sentence Splitter Splitter Splitter SentencePro Splitter … Morphological analyzer: improvement of the tool, lexicon extension and debugging. • Sentence splitter: partial improvement. … Related articles – View as HTML – All 4 versions

Transition of legacy systems to semantically enabled applications: TAO method and tools HH Wang, D Damljanovic, T Payne, N Gibbins… – Semantic Web, 2011 – IOS Press Page 1. Semantic Web 0 (2011) 1-12 1 DOI 10.3233/SW-2011-0039 IOS Press 1 52 2 53 3 54 4 55 5 56 6 57 7 58 8 59 9 60 10 61 11 62 12 63 13 64 14 65 15 66 16 67 17 68 18 69 19 70 20 71 21 72 22 73 23 74 24 75 25 76 26 77 27 78 28 79 29 80 30 81 31 82 32 83 33 84 … Related articles

Generation and analysis of verbal route directions for blind navigation [PDF] from umi.com J Nicholson – 2011 – gradworks.umi.com Page 1. GENERATION AND ANALYSIS OF VERBAL ROUTE DIRECTIONS FOR BLIND NAVIGATION by John Nicholson A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science Approved: … Related articles – Library Search – All 4 versions

Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages [PDF] from usf.edu J Jarman – 2011 – scholarcommons.usf.edu Page 1. University of South Florida Scholar Commons Theses and Dissertations 1-1-2011 Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages Jay Jarman University of South Florida, jay@jayjarman.com …

[PDF] Grammatical error prediction [PDF] from psu.edu ØE Andersen – month, 2011 – Citeseer Page 1. Technical Report Number 794 Computer Laboratory UCAM-CL-TR-794 ISSN 1476-2986 Grammatical error prediction Øistein E. Andersen January 2011 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ … Related articles – View as HTML – All 7 versions

Semantic information systems engineering: A query-based approach for semi-automatic annotation of web services [PDF] from brunel.ac.uk MM Al-Asswad – School of Information Systems, …, 2011 – v-scheiner.brunel.ac.uk Page 1. Semantic Information Systems Engineering: A Query-based Approach for Semi-automatic Annotation of Web Services A thesis submitted for the degree of Doctor of Philosophy By Mohammad Mourhaf AL Asswad Department of Information Systems and Computing, … Related articles – All 2 versions

Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics [PDF] from biomedcentral.com L Chepelev, A Riazanov, A Kouznetsov… – BMC …, 2011 – biomedcentral.com Page 1. This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. Prototype Semantic Infrastructure for Automated Small Molecule Classification and Annotation in Lipidomics … Cited by 1 – Related articles – All 8 versions

A distributional and syntactic approach to fine-grained opinion mining [PDF] from umd.edu AB Sayeed – 2011 – drum.lib.umd.edu Page 1. ABSTRACT Title of dissertation: A DISTRIBUTIONAL AND SYNTACTIC APPROACH TO FINE-GRAINED OPINION MINING Asad Basheer Sayeed, Doctor of Philosophy, 2011 Dissertation directed by: Professor Amy … Related articles

Uso de grafos semánticos en la generación automática de resúmenes y estudio de su aplicación en distintos dominios: biomedicina, periodismo y turismo [PDF] from ucm.es L Plaza Morales – 2011 – eprints.ucm.es Page 1. UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE INFORMÁTICA Departamento de Ingeniería del Software e Inteligencia Artificial USO DE GRAFOS SEMÁNTICOS EN LA GENERACIÓN AUTOMÁTICA DE RESÚMENES Y … Related articles – All 4 versions