Sentence Splitter 2014

Notes:

Every document is split into sentences, which are then parsed

  • ANNIE regex sentence splitter
  • Moses sentence splitter
  • Regex sentence splitter
  • Sentence splitter heuristic
  • Sentence splitter rules
  • Statistical sentence splitter

Resources:

References:

See also:

Sentence Extractor | Sentence Grammaticality | Sentence Parsers & Dialog Systems | Sentence Splitting & Dialog SystemsSentence Recognizer


Text mining services for public release I Roberts – annomarket.eu … 2.2.5 Apache OpenNLP pipelines The tokeniser, sentence splitter, POS tagger, phrase chunker and named-entity recogniser from Apache OpenNLP1. … Variants of this pipeline are available for English and Dutch: Default annotations :Person … Related articles

Linked Hypernyms Dataset-Generation Framework and Use Cases T Kliegr, V Zeman, M Dojchinovski – The 3rd Workshop on Linked …, 2014 – dojchinovski.mk … the following processing resources: 1. ANNIE English Tokenizer 2. ANNIE Regex Sentence Splitter 3. ANNIE Part-of-Speech Tagger (English), TreeTagger (other languages) 4. JAPE Transducer The hypernym extraction is performed … Related articles

Language Resources and Annotation Tools for Cross-Sentence Relation Extraction S Krause, H Li, F Xu, H Uszkoreit, R Hummel… – Proceedings of the …, 2014 – lrec-conf.org … Sentence segmentation: We employed the sentence splitter for English from Stanford CoreNLP. • Relation mentions: Using a well performing subset of the extraction patterns from (Moro et al., 2013), we automatically marked potential mentions of the three target relations. … Related articles

Improving iUnit Retrieval with Query Classification and Multi-Aspect iUnit Scoring: The IISR System at NTCIR-11 MobileClick Task CT Chang, YH Wu, YL Tsai… – Proc. of NTCIR-11 …, 2014 – research.nii.ac.jp … The lists for rules 6 and 7 are based on pages from Wikipedia. Our PEO- PLE list contains all person names under the category page “People” in English Wikipedia. … We use the Apache OpenNLP1 sentence splitter to sepa- rate sentences and filter out irrelevant sentences. … Cited by 1 Related articles

Inventory of Linguistic Processors A Lavelli, IAL EHU – cs.upc.edu … Inventory of Linguistic Processors Page : 12 3.3 Splitter Type: Sentence Splitter Author: UPC Description: Splits a stream of tokens into sentences Languages: Spanish, Catalan, English Portability: Easy. Requires sentence marked training corpus. Requirements: Unix/Linux. … Related articles

The neofonie nerd system at the erd challenge 2014 S Kemmerer, B Großmann, C Müller… – Proceedings of the first …, 2014 – dl.acm.org … After using the OpenNLP3 sentence splitter and tokenizer with an English model, those problems were resolved. Nev- ertheless, there are many other of these pitfalls relying on bad tokenizing. 3https://opennlp.apache.org/ 84 Page 3. … Cited by 1 Related articles All 2 versions

Automating Data Abstraction in a Quality Improvement Platform for Surgical and Interventional Procedures M Yetisgen, P Klassen… – … & Methods to …, 2014 – repository.academyhealth.org … Tokenizer Sentence Splitter Section Chunker Concept Tagger … Therefore, it requires that the tokenizer must be run before the sentence breaker. We use the OpenNLP English tokenizer and sentence breaker (http://opennlp.apache.org) to tokenize and split text into sen- tences. … Related articles All 2 versions

Biomedical text mining for concept identification from traditional medicine literature Z Javed, H Afzal – Open Source Systems and Technologies ( …, 2014 – ieeexplore.ieee.org … GATE comprising tokenizer, sentence splitter, part of speech tagger, a named entities recognizer and coreference tagger. … It takes raw English language text as input and gives the base forms of words, parts of speech tags, named entity recognition, normalize and mark up the …

Disclose Models, Hide the Data—How to Make Use of Confidential Corpora without Seeing Sensitive Raw Data E Faessler, J Hellrich, U Hahn – lrec-conf.org … Component Repository (JCORE) (Hahn et al., 2008) was built as a repository of interoperable UIMA components7 already adapted to the special needs of the analysis of English life sciences … All tools except NLTK’s sentence splitter come with evaluation methods of their own. … Related articles

Extracting structured data from publications in the Art Conservation Domain S Odat, T Groza, J Hunter – Literary and Linguistic Computing, 2014 – ALLC … in the UI. RegEx Sentence Splitter, English Tokenizer, POS Tagger, and VP Chunker from the ‘ANNIE’ plug-in (Cunningham et al., 2002) are used to markup sentences, tokens, PoS, and verb phrases, respectively. The GATE … Related articles All 2 versions

Learning Ontologies from Software Artifacts: Exploring and Combining Multiple Choices K Bontcheva – Semantic Web Enabled Software Engineering, 2014 – books.google.com … Next in the pipeline is the English morphological analyser, which is be- ing used to annotate all words with their root forms (eg, the root of the word “resources” is “resource”). … For example, some terms derived from the source code are ANNIE, sentence, splitter, and gazetteer. …

Enhanced sentiment analysis of informal textual communication in social media by considering objective words and intensifiers J Bhaskar, K Sruthi, P Nedungadi – Recent Advances and …, 2014 – ieeexplore.ieee.org … In this work, at first the documents other than English are converted into English by using standard translation software. … Preprocessing follows the same step as traditional text mining which consists of sentence splitter, POS tagging, stemming and stop word removal. … Cited by 1 Related articles

Expression of laboratory examination results in medical literature T Okumura, E Aramaki, Y Tateisi – The Fourth Workshop on Building …, 2014 – nactem.ac.uk … The remaining 3,369 records were selected for further processing, and the sentence splitter yielded 32,768 sentences … The dictionary for evaluative expressions includes prepo- sitions such as above and below that appear often in ordi- nary English writings, and adjectives such … Cited by 1 Related articles

Slicepedia: Content-agnostic slicing resource production for adaptive hypermedia K Levacher, S Lawless, V Wade – Computer Science and …, 2014 – doiserbia.nb.rs … GATE NLP algorithms. This tokenizer outputs annotations pointing to the start and end of each token identified within the content processed. Following the tokenizer, is the ANNIE English sentence splitter. This domain and application … Related articles All 7 versions

Using Automatic Morphological Tools to Process Data from a Learner Corpus of Hungarian P Durst, MK Szabó, V Vincze, J Zsibrita – 2014 – jyx.jyu.fi … need for more research to be done, it may also be of interest that recently a natural language processing (NLP) toolkit called magyarlanc (consisting of a sentence splitter, a morphological … Table 2. V-stems Word Gloss English Possible erroneous form 1st character of error code … Related articles

Linguistic Processors and Infrastructure B Magnini, L Bentivogli, A Lavelli, IALEHUJA Batalla… – cs.upc.edu … Basque Catalan English Italian Spanish Language LangId LangId LangId LangId Identi er Lang-id Lang-id Lang-id Tokenizer Tokenizer UPC UPC TokenPro UPC Sussex LEX-Tokenizer CL-Tokenizer Sentence Splitter Splitter Splitter Splitter … Related articles

Ontology based optimization techniques for information retrieval UK Sridevi – 2014 – ietd.inflibnet.ac.in … Gate API is used to annotate the document and ontology population. The ANNIE English Tokeniser splits the … With the ANNIE Gazetteer words are looked up in the gazetteer lists in order to classify them. With the ANNIE Sentence Splitter, the text is split into sentences, which is … Related articles All 4 versions

Term Extraction and Disambiguation for Semantic Knowledge Enrichment: A Case Study on Initial Public Offering (IPO) Prospectus Corpus J Tao, OF El-Gayar, O El, AV Deokar, Y Chang – conferences.computer.org … Note that English language stop words are removed from the term candidates, yet de- terminers (ie a, an, the) are kept … GATE pro- vides a variety of packaged analytical/processing functionalities (namely Processing Resources, PR), such as Tokenizer, Sentence Splitter, and NP … Related articles

Extraction of Temporal Networks from Term Co-Occurrences in Online Textual Sources M Popovi?, H Štefan?i?, B Sluban, PK Novak, M Gr?ar… – PloS one, 2014 – dx.plos.org … a document and discards all the non-English documents. The model is constructed by a machine learning algorithm, and trained on a large multilingual set of documents. The basic features for model training are the frequencies of several consecutive letters. Sentence Splitter. … Related articles All 14 versions

Syntactic N-gram Collection from a Large-Scale Corpus of Internet Finnish J Kanerva, J Luotolahti, V Laippala… – Proceedings of the …, 2014 – w3.erss.univ-tlse2.fr … [2]. This pipeline consists of a statistical sentence splitter and tokenizer … The corpus is made available in the form of flat and syntactic n-grams, in the same format in which Google recently published their collection of English flat and syntactic n-gram data derived from the Google … Cited by 1 Related articles All 3 versions

Collecting Statistical Information on Noun-Adjective Multiword Expressions for Extracting the Noun-Noun Ones. M Dubremetz – stp.lingfil.uu.se … Ramisch et al. (2010b) provide experi- ments on Portuguese, English and Greek. To the best of our knowledge only Zilio et al. (2011) provide experiments with this tool as well. In … (2011). First we ran the sentence splitter and the tokenizer provided with the Europarl corpus. … Related articles

Creating Summarization Systems with SUMMA H Saggion – 2014 – lrec-conf.org … used since 2010 in different projects and evaluation programmes including the TOPAS Project4 (patent summarization in English, French and … require tokens (eg words) and sentences which can be easily obtained applying the GATE default tokeniser and sentence splitter. … Cited by 1 Related articles

Generation of Software Artifacts and Models at Analysis Phase DM Thakore, RP Patki – academia.edu … After tokenizing each token is stored in separate array list. While tokenizing the English input sentence splitter is used to identify the boundary of each sentence. 2. Tagging This processed text is further given as input to Part Of Speech (POS) tagger to identify the basic POS tags. … Related articles

Protein Name Recognition Based on Dictionary Mining and Heuristics SH Lin, SH Ding, WS Zeng – Algorithmic Aspects in Information and …, 2014 – Springer … These rules are applied to detect pro- tein name fragments as core tokens since many protein name fragments share com- mon morphological features like upper/lower case English alphabets, digits, Greek letters or Roman numbers, etc. … Sentence Splitter Heuristic Rules … Related articles All 2 versions

Semantic event extraction from biological texts using a kernel-based method R Faiz, M Amami, A Elkhlifi – Advances in Knowledge Discovery and …, 2014 – Springer … Hence, we use the state-of-the-art splitter optimized for the biological corpora, namely the GENIA sentence splitter GENIASS2. GENIASS [Rune et al., 2007] is based on a supervised leaning method using maximum entropy modeling including a set of features: delimiters of the … Cited by 1 Related articles All 2 versions

General Architecture for Text Engineering (GATE) Developer for Entity Extraction: Overview for SYNCOIN M Vanni, A Neiderer – 2014 – DTIC Document … 6 Sentence Splitter, (5) Gazetteer(s), (6) English Tokenizer, (7) Document Reset PR. ‡ The resources can be accessed via either the GATE GUI, as in figure 3 (left), a detail of the upper frame of the Resources pane seen in figure 2, or the Command Line, shown in figure 3 (right). … Related articles

The Hungarian Gigaword Corpus C Oravecz, T Váradi, B Sass – Proceedings of LREC, 2014 – lrec-conf.org … 3https://lrt.clarin.eu/tools/huntoken-tokenizer-and-sentence- splitter 4This is the level of analysis that is encoded for example in the MULTEXT-East specifications … This format is illus- trated with a small extract for the phrase “the English lan- guage text [is] the primary” in Figure 1.7 … Cited by 1 Related articles All 3 versions

NewSum:“N-Gram Graph”-Based G Giannakopoulos, G Kiomourtzis… – Innovative Document …, 2014 – books.google.com … com) is an application offering a social media view of news integration, by generating a stream of summarized news. It is multi-document, but applied on a single language (English). … com), which is an English-only summarization solu- tion, provided via a web interface. … Cited by 1 Related articles All 5 versions

Extracting Information for Context-aware Meeting Preparation S Scerri, B QasemiZadeh… – Proceedings of the …, 2014 – aran.library.nuigalway.ie … It consists of an ANNIE Corpus IE Pipeline which includes the standard GATE English tokeniser, sentence splitter, Hepple POS tagger and named entity transducer; in addition to the ANNIE gazetteer lookup and a set of 121 Page 3. … Related articles All 18 versions

ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus Z Afzal, E Pons, N Kang, MCJM Sturkenboom… – BMC …, 2014 – biomedcentral.com … ContextD: ConText for Dutch The ConText algorithm uses pre-defined English trigger terms to determine the value of the contextual properties. … We used the Dutch sentence splitter in the Apache OpenNLP library [34] to split the text into sentences. … Cited by 1 Related articles All 7 versions

Natural language processing in biomedicine: a unified system architecture overview S Doan, M Conway, TM Phuong… – Clinical …, 2014 – Springer … LSP-MLP was used for processing clinical narratives in English, and it was also extended into other languages such as French, German, and Dutch [ 1 ]. It has been used to map clinical text into SNOMED codes [ 17 , 18 ]. … The sentence splitter breaks sections into sentences. … Cited by 1 Related articles All 8 versions

Mining Bio Medical Literature Using Ontology Based Text Mining M Gayathri, K Mythili, RJ Kannan – caesjournals.org … directly via look-up) • Rule-based semantic annotation • Disambiguation • Final output creation The linguistic pre-processing phase contains GATE components such as tokenization and sentence splitter. It also contains specific tools like Stanford Parser for English part-of … Related articles

SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter B Velichkov, B Kapukaranov, I Grozev… – SemEval …, 2014 – anthology.aclweb.org … for tweet analysis that are already available in GATE (Bontcheva et al., 2013) such as a Twitter tok- enizer, a sentence splitter, a hashtag … We use the word clusters built by CMU’s NLP toolkit, which were produced over a collection of 56 million English tweets (Owoputi et al., 2012 … All 8 versions

Adapting taggers to Twitter with not-so-distant supervision B Plank, D Hovy, R McDonald, A Søgaard – 2014 – aclweb.org … Most annotated corpora for English are newswire corpora. Some annotated Twitter data sets have been made available recently, described next. POS NER … (2011) and Owoputi et al. (2013); websites were processed by applying the Moses sentence splitter.16 The out-of … Cited by 1 Related articles All 7 versions

Healthcare Decision Support System for Administration of Chronic Diseases JI Woo, JG Yang, YH Lee… – Healthcare informatics …, 2014 – synapse.koreamed.org … English. Published online 2014 July 31. … The NLPC is configured by a natural language processing module (sentence splitter, part of speech, tagger, parser, and a proper noun recognizer) required for the extraction of knowledge from the collected unstructured data or free text. … Related articles All 11 versions

Intelligent Interface for Textual Attitude Analysis A Neviarouskaya, M Aono, H Prendinger… – ACM Transactions on …, 2014 – dl.acm.org Page 1. 48 Intelligent Interface for Textual Attitude Analysis ALENA NEVIAROUSKAYA and MASAKI AONO, Toyohashi University of Technology HELMUT PRENDINGER, National Institute of Informatics, Tokyo MITSURU ISHIZUKA, University of Tokyo … Related articles

Multi-Entity Polarity Analysis in Financial Documents JZ Ferreira, J Rodrigues, M Cristo… – Proceedings of the 20th …, 2014 – dl.acm.org … To recognize anaphora, the raw text is first segmented into sen- tences using a sentence splitter. … For instance, in English, nouns can be identified by recognizing other linguistic structures such as definite pronouns (eg, “Ross bought {a MP3 player / three flow- ers} and gave {it … Related articles

A multi-phase correlation search framework for mining non-taxonomic relations from unstructured text MK Wong, SSR Abidi, ID Jonsen – Knowledge and information systems, 2014 – Springer … As the proposed approach was tested for Persian texts, the approach is not applicable to English text directly … on processing the collection of text document which is in the form of Portable Document Format (PDF) via two tasks: (a) document preprocessor and (b) sentence splitter. … Related articles All 3 versions

Big data for Natural Language Processing: A streaming approach R Agerri, X Artola, Z Beloki, G Rigau, A Soroa – Knowledge-Based Systems, 2014 – Elsevier … Table 1 shows the modules installed into the English VM. We defined two pipelines for event extraction, each one comprising different modules (last column in the table). Table 1. … Module, Description, Pipeline. ixa-pipe-tok, Tokenizer, sentence splitter, (1, 2). … Related articles All 2 versions

Characterization of toponym usages in texts SJ Wolf, A Henrich, D Blank – Proceedings of the 8th Workshop on …, 2014 – dl.acm.org … 4.1 Existing GATE Plugins GATE is already equipped with a standard tokenizer and sentence splitter for several languages. … of the Stanford Natural Language Parser (NLP) for a sentence dependency analysis is avail- able, but it currently only supports English and Chinese4. …

Crowdsourcing named entity recognition and entity linking corpora K Bontcheva, L Derczynski, I Roberts – Handbook of Linguistic …, 2014 – derczynski.com … Firstly, documents are pre-segmented into sentences and word tokens, using GATE’s TwitIE plugin [8], which provides a tokeniser, POS tagger, and a sentence splitter, specifically adapted to microblog content. … Workers could be from any English-speaking nation. … Cited by 2

Unsupervised Ontology Enrichment with Hierarchical Self-Organizing Maps E? Chifu, IA Le?ia – cs-gw.utcluj.ro … to identify such term categories by a linguistic analysis of the corpus documents, our framework relies on several processing resources offered by the GATE framework [8] for analyzing English texts: morphological analyzer (stemmer), tokenizer, sentence splitter, the Hepple part …

New Technology Trends Watch: An Approach and Case Study IV Efimenko, VF Khoroshevsky – Artificial Intelligence: Methodology, …, 2014 – Springer … The total number of texts equals 131,477 documents in English 4 https://gate.ac.uk/ 5 http://tagcrowd.com/ 6 http://protegewiki.stanford.edu/wiki/OntoGraf Web of Science Scopus esp@cenet eLibrary … Morph/POS Tagger Tokenizer rules Sentence Splitter rules Internet G … Cited by 2 Related articles All 4 versions

Optimization Tasks in the Conversion of Natural Language Texts into Function Calls P Barabás, L Kovács – Applied Information Science, Engineering and …, 2014 – Springer … Stanford [7] or NLTK [8], which support mostly English language and several other widely spoken languages. There is a Hungarian toolkit for linguistic processing called “magyarlanc” [9] developed by the University of Szeged, which contains a sentence splitter, a tokenizer, a … Related articles All 3 versions

FactRunner: A New System for NLP-Based Information Extraction from Wikipedia R Sutoyo, C Quix, F Kastrati – Web Information Systems and Technologies, 2014 – Springer … structured information from Wikipedia infoboxes [ 7 ] or the category system [ 8 ]. Our method utilizes the existing metadata present in Wikipedia articles (ie, links between articles) to extract facts with high accuracy from Wikipedia’s English natural language … 3.3 Sentence Splitter. … Related articles All 3 versions

Temporal Expression Recognition Using Dependency Trees P Mazur, R Dale – … Technology Challenges for Computer Science and …, 2014 – Springer … We used the ANNIE sentence splitter from GATE for all parsers, except for Connexor which carries out its own sentence splitting. … Workshop, Delft, The Netherlands, March 2005. 3. Hacioglu, K., Chen, Y., Douglas, B.: Automatic time expression labeling for English and Chinese … Related articles All 10 versions

Anatomical entity mention recognition at literature scale S Pyysalo, S Ananiadou – Bioinformatics, 2014 – Oxford Univ Press … We thus initially segment input text into sentences and those further into tokens. For sentence segmentation, we apply the GENIA sentence splitter trained on the GENIA treebank (Tateisi and Tsujii, 2006) with a heuristic post-processor to correct some common errors. … Cited by 9 Related articles All 8 versions

Someone to Talk To K Koroveshovski, S Gievska – Advances in Affective and …, 2014 – books.google.com … AFINN-1112, a list of 2477 English words and phrases annotated with their valence rating, an integer value between -5 … In particular, sentence splitter, tokenizer, part-of-speech (POS) tagger, and grammatical word tagging (eg, adverbs, adjectives, verbs, pronouns) have been … Cited by 1 Related articles

Semano: Semantic Annotation Framework for Natural Language Resources D Berry, N Nikitina – The Semantic Web–ISWC 2014, 2014 – Springer … In our examples within this section, we assume that documents have been pre-processed using an English tokenizer, a sentence splitter, a part of speech (POS) tagger and an orthoMatcher, which are all included within the standard GATE application ANNIE. … Related articles

Extending a Tool Resource Framework with U-Compare M Rosner, A Attard, P Thompson, A Gatt… – … for Computer Science …, 2014 – Springer … METANET4U, with which the present paper is mainly concerned, deals with Spanish, Portuguese, Maltese, English and Romanian. … 2. Fig. 2. UIMA wrapper. The MLRS tokeniser being considered makes use of a sentence splitter for Maltese and then tokenises each sentence. … Related articles All 4 versions

Requirement boilerplates: Transition from manually-enforced to automatically-verifiable natural language patterns C Arora, M Sabetzadeh, LC Briand… – … Patterns (RePa), 2014 …, 2014 – ieeexplore.ieee.org … Tokenizer Sentence Splitter POS Tagger Named Entity Recognizer … The first step in the pipeline, Tokenization, breaks up the sentence into units called tokens, such as words, numbers, or punctuation. The next component, Sentence Splitter, segments the text into sentences. … Related articles All 7 versions

EXACT2: the semantics of biomedical protocols LN Soldatova, D Nadis, RD King, PS Basu… – BMC …, 2014 – ncbi.nlm.nih.gov … the following (semi-) automated framework for the translation of biomedical protocols expressed in natural language (English) into a … The National (UK) Centre for Text Mining (NaCTeM) provides various text mining tools, including GENIA Sentence Splitter (GeniaSS) optimized … Cited by 1 Related articles All 7 versions

Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery T Kliegr – Web Semantics: Science, Services and Agents on the …, 2014 – Elsevier … Abstract. The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. … For the English dataset, the average accuracy is 0.86, for German 0.77 and for Dutch 0.88. … Cited by 2 Related articles All 17 versions

Modeling Review Argumentation for Robust Sentiment Analysis H Wachsmuth, M Trenkmann, B Stein… – Proceedings of the 25th …, 2014 – uni-weimar.de … Our experiments are based on two English text corpora with reviews from the hotel domain and the movie domain, respectively … Preprocessing For feature computations, we preprocess all texts with a tokenizer, a sentence splitter, and the part-of-speech tagger from (Schmid, 1995 … Cited by 1 Related articles All 6 versions

Identifying duplicate functionality in textual use cases by aligning semantic actions A Rago, C Marcos, JA Diaz-Pace – Software & Systems Modeling, 2014 – Springer … The NLP components output information such as sentences and token boundaries (sentence splitter and tokenizer), token properties (part-of-speech tagger … several textual sentences is a complex activ- ity, mainly due to the expressiveness and ambiguity of the English language … Cited by 1 Related articles

Cross-hospital portability of information extraction of cancer staging information D Martinez, G Pitson, A MacKinlay… – Artificial intelligence in …, 2014 – Elsevier … It applies a preconfigured pipeline, including a sentence splitter, tokeniser, POS-tagger and shallow parser. … 11 We supplied our pathology reports in unaltered form (except for replacing tumour with tumor, to match MedKAT/P’s assumption of US English spelling), but made no … Related articles All 5 versions

Quantifying Irony With Sentiment L ESPINOSA-ANKE – … Research in Applied Linguistics: Issues on …, 2014 – books.google.com … to maturity: happiness, sadness, anger…” (Francisco and Hervás 2007: 8). The corpus consisted in fairy tales both in English and Spanish. … It might be the case that our sentence splitter incorrectly detects a sentence boundary, usually in cases where punctuation marks are used. …

Motif-Based Hyponym Relation Extraction from Wikipedia Hyperlinks B Wei, J Liu, J Ma, Q Zheng, W Zhang… – Knowledge and Data …, 2014 – ieeexplore.ieee.org … work. For example, Snow et al. [28] proposed a method to automatically extract additional lexico-syntactic patterns from WordNet and a large text corpus by analyz- ing the dependency path of English sentences. Kozareva et al. … Related articles All 8 versions

A query language and ranking algorithm for news items in the Hermes news processing framework F Hogenboom, D Vandic, F Frasincar, A Verheij… – Science of Computer …, 2014 – Elsevier Hermes is a Web-based framework that makes use of many Semantic Web technologies for building personalized news services. Ontologies are employed for knowledge. Cited by 2 Related articles All 4 versions

Assessing automatic text classification for interactive language learning A Branco, J Rodrigues, F Costa… – Information Society (i- …, 2014 – ieeexplore.ieee.org … IMPLEMENTATION Following the design options discussed above, the implementation of this service was based on a number of natural language processing tools whose state of the art performance offers pretty good accuracy: tokenizer, sentence splitter, syllabifier, POS tagger … Related articles

Text Stylometry For Chat Bot Identification And Intelligence Estimation N Ali – 2014 – digital.library.louisville.edu … Lexical Token based (word length, sentence length, etc.) Tokenizer, (Sentence splitter) Vocabulary richness Tokenize … Features Required Tools Syntactic Part-of-Speech Tokenizer, Sentence splitter, POS tagger Chunks Tokenizer, Sentence splitter, [POS tagger], Text chunker … Related articles

ParaLite: A Parallel Database System for Data-Intensive Workflows C Ting, K Taura – IEICE TRANSACTIONS on Information and …, 2014 – search.ieice.org Page 1. IEICE TRANS. INF. & SYST., VOL.E97–D, NO.5 MAY 2014 1211 PAPER ParaLite: A Parallel Database System for Data-Intensive Workflows Ting CHEN †a) , Member and Kenjiro TAURA † , Nonmember SUMMARY … Related articles All 4 versions

Sentiment prediction based on dempster-shafer theory of evidence ME Basiri, AR Naghsh-Nilchi… – … Problems in Engineering, 2014 – hindawi.com … This classifier was built especially to cope with sentiment detection in short informal English text [6]. The core of SentiStrength is a lexicon of 2,310 opinion words obtained from several sources [48 … Therefore, the second module of the flowchart in Figure 1 is the sentence splitter. … Cited by 2 Related articles All 2 versions

A decision support system: Automated crime report analysis and classification for e-government CH Ku, G Leroy – Government Information Quarterly, 2014 – Elsevier … and found that, eg, dense-housing communities (eg, apartment complexes) with large number of non-English speakers and … The information-processing layer is comprised of eight components: tokenizer, sentence splitter, POS tagger, stemmer, gazetteer, ortho-matcher, noun … Related articles

Naïve Bayes classifiers for authorship attribution of Arabic texts AS Altheneyan, MEB Menai – Journal of King Saud University-Computer …, 2014 – Elsevier … Several authorship attribution methods were developed for natural languages, such as English, Chinese and Dutch. However, the number of related works for Arabic is limited. … They tested their method on English and Arabic web forum messages. … Related articles

Combined Language Processing Methods and Mash-Up System for Improving Retrieval in Diabetes Related Patents I Chorbev, D Davcev, D Boshnakoska – Multidisciplinary Information …, 2014 – Springer … Each collected patent is processed using a tokenizer, sentence splitter and section recognizer. … It is based on a score that reflects the semantic relation between meanings of the sentences. It is based on WordNet [39] – a large lexical database of the English language. … Related articles All 3 versions

Revolutionary entities: Turning data into knowledge to drive personalized exploration of The irish rising of 1916 O Conlan, A O’Connor, ON Loinsigh… – Big Data (Big Data), …, 2014 – ieeexplore.ieee.org … The English contained in the documents is generally modern and in standard form. … It consists of seven modules, of which four are defaults included with the GATE software. These are: a sentence splitter, a tokenizer, a basic named-entity transducer, and a part-of-speech tagger. … Related articles

Aimed information quantity in text J Chen, H Zhuge – Concurrency and Computation: Practice …, 2014 – Wiley Online Library … Every document is split into sentences, which are then parsed and nouns are kept. The sentence splitter and POS tagger tool we use is StanfordCoreNLP, which provides a set of natural language analysis tools written in Java. … Cited by 1 Related articles

Evaluating techniques for learning non-taxonomic relationships of ontologies from text I Serra, R Girardi, P Novais – Expert Systems with Applications, 2014 – Elsevier … that uses NLP and statistical solutions to extract non-taxonomic relationships of predefined ontology concepts from an English corpus. … Sentence splitter is necessary because the sentence is the linguistic unit from which non-taxonomic relationships are extracted by applying the … Cited by 2 Related articles All 3 versions

A Language Visualization System E Unal – 2014 – denizyuret.com … The aim of this system is to make scene construction task easier by using a strict subset of English and direct manipulation. 3D scene construction can be … 14 2.6 CONFUCIUS CONFUCIUS [14] is a storytelling system that visualizes single English sentences into 3D animations. … Related articles All 2 versions

SkipCor: Skip-Mention Coreference Resolution Using Linear-Chain Conditional Random Fields S Žitnik, L Šubelj, M Bajec – PloS one, 2014 – dx.plos.org PLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific studies from all disciplines freely available to the whole world. Related articles All 11 versions

Entity query feature expansion using knowledge base links J Dalton, L Dietz, J Allan – Proceedings of the 37th international ACM …, 2014 – dl.acm.org … on either side of a mention, 50 words on either side, or one sentence, where sentence boundaries are determined by a sentence-splitter. … linked entities described in Section 3. Our Wikipedia collection is derived from a Freebase- generated dump of the English Wikipedia from … Cited by 9 Related articles All 6 versions

Does diversity lead to diverse opinions? evidence from languages and stock markets YC Chang, HG Hong, L Tiedens, N Wang… – Rock Center for …, 2014 – papers.ssrn.com … in x. Unlike English where the meaning of each word is usually self-contained, Chinese words typically contain different numbers of characters to carry their meaning. Therefore we use a Chinese sentence splitter software fundannlp to first retrieve the key words in all the … Cited by 1 Related articles All 6 versions

Comparable Study of Event Extraction in Newswire and Biomedical Domains M Miwa, P Thompson, I Korkontzelos, S Ananiadou – aclweb.org … We also employed the GENIA sentence splitter (Sætre et al., 2007) for sentence splitting, and the snowball (Porter2) stemmer4 for stemming. … 2005. NYU’s english ACE 2005 system description. In Proceedings of ACE 2005 Evaluation Workshop, Washington, US. … Related articles All 5 versions

Exploratory Professional Search through Semantic Post-Analysis of Search Results P Fafalios, Y Tzitzikas – Professional Search in the Modern World, 2014 – Springer … 22,16] is a ready-made information extraction system which contains several components (eg Tokeniser, Gazetteer, Sentence Splitter, etc.) and … there are natural language approaches that guide users in formulating queries in a language seemingly akin to English and translate … Cited by 1 Related articles All 4 versions

Automated spatiotemporal and semantic information extraction for hazards W Wang – 2014 – ir.uiowa.edu … 12 inform users about the spatiotemporal pattern of hazard-related events. In this research, only text documents in English are considered and the text corpus is primarily concerned with locations in the United States. As part of this research, we employ GIS (ArcGIS 10.1) … Cited by 1 Related articles All 2 versions

Building Linguistic Corpora from Wikipedia Articles and Discussions E Margaretha, H Lüngen – … von/Edited by Michael Beißwenger, Nelleke …, 2014 – jlcl.org … The project described in Ferschke et al.(2012) exploits this fact to construct a corpus of discussions of the Simple English Wikipedia and provides it with dialog act annotations for research on collaborative authoring on the web. … Related articles All 4 versions

Rewiring of the plant interactome in response to environmental stress J VERCRUYSSE – 2014 – buck.ugent.be … x English Summary ….. xii … xi Page 13. xii English Summary Plants are continually challenged to respond to detrimental changes in their environment in order to avoid destructive effects on growth and development. …

Ontology for Customer Reviews T Kisel – vlebb.leeds.ac.uk … phase (semantic annotation) Semantic annotation links the base form and the enriched form from the ontology entities. It has different modules eg Sentence splitter, named entity recognizer, and orthographic matcher. Semantic augmentation has ontology as a baseline. …

Ripple Down Rules for Question Answering DQ Nguyen, DQ Nguyen, SB Pham – arXiv preprint arXiv:1412.4160, 2014 – arxiv.org … However, these systems are mostly designed for English, therefore, we introduce in this paper such a system for Vietnamese, that is, to the best of our knowledge, the first one made for Vietnamese. … Figure 6. A part of our SCRDR tree for processing English ques- tions. … Related articles All 2 versions

[BOOK] Automated evaluation of text and discourse with Coh-Metrix DS McNamara, AC Graesser, PM McCarthy, Z Cai – 2014 – books.google.com … For example, Louwerse, McCarthy, McNamara, and Graesser (2004) identified significant differences between spoken and written samples of English. … We do not need to conduct a survey to discover how most English speakers will vote. … Cited by 35 Related articles All 3 versions

System Design, draft Deliverable D2. Z Beloki, G Rigau, A Soroa, A Fokkens, P Vossen… – newsreader-project.eu … Thomas Ploeger4, Willem Robert van Hage4 Keywords: system design, big data, scaling NLP Abstract: This deliverable describes the first version of the System Design frame- work developed in NewsReader to process large and continuous streams of English, Dutch, Spanish … Related articles

Concept Exploration And Discovery From Business Documents For Software Engineering Projects Using Dual Mode Filtering PA MÉNARD – 2014 – espace.etsmtl.ca … In addi- tion, our research is done on documents written in French, for which text analysis tools are less accessible or advanced than those written in English. … (English, French, Spanish, etc.) which are informal by nature, offers a plethora of unsolved challenges. … Related articles All 2 versions

[BOOK] Named entity extraction and disambiguation for informal text: the missing link M Badieh Habib Morgan – 2014 – doc.utwente.nl … Page 26. 1.2 Examples of Application Domains 5 Although state-of-the-art NER systems for English produce near-human performance [2], their performance drops when applied to informal text of UGC where the ambiguity increases. … Related articles All 3 versions

Using Genetic Algorithms for Feature Set Selection in Text Mining BC Rogers – 2014 – rave.ohiolink.edu Page 1. ABSTRACT USING GENETIC ALGORITHMS FOR FEATURE SET SELECTION IN TEXT MINING by Benjamin Charles Rogers The rationale behind design decisions are often recorded in different project documentation. …

Beyond linear chain: a journey through conditional random fields for information extraction from text D Marcheggiani – 2014 – dspace.unive.it Page 1. Universit`a Ca’ Foscari di Venezia Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Beyond Linear Chain: A Journey through Conditional Random Fields for Information Extraction from Text Diego Marcheggiani Supervisor Dott. … Related articles All 3 versions

Configuring named entity extraction through real-time exploitation of linked data P Fafalios, M Baritakis, Y Tzitzikas – Proceedings of the 4th International …, 2014 – dl.acm.org … FILTER(lang(?label)=’en’) } Figure 4: Example of a SPARQL query for retriev- ing a list of English fish names … ANNIE [14, 11] is a ready-made information extraction system which contains several components (eg Tokeniser, Gazetteer, Sentence Splitter, Orthographic Coreference … Cited by 2 Related articles All 3 versions

Text mining with semantic annotation: using enriched text representation for entity-oriented retrieval, semantic relation identification and text clustering J Hou – 2014 – eprints.qut.edu.au Page 1. TEXT MINING WITH SEMANTIC ANNOTATION: USING ENRICHED TEXT REPRESENTATION FOR ENTITY- ORIENTED RETRIEVAL, SEMANTIC RELATION IDENTIFICATION AND TEXT CLUSTERING Mr Jun Hou … Related articles

[BOOK] Perspectives on Ontology Learning J Lehmann, J Völker – 2014 – books.google.com Page 1. PERSPECTIVES ON ONTOLOGY LEARNING Page 2. Studies on the Semantic Web www.semantic-web-studies.net Semantic Web has grown into a mature field of research. Its methods find innovative applications on and off the World Wide Web. … Related articles All 4 versions

An automated tool for semantic accessing to formal software models HH Wang, D Damljanovic, J Sun – Science of Computer Programming, 2014 – Elsevier … Users have to refer to various knowledge about GATE, which are documented at different parts by over 100 documents. These documents are in different formats, ranging from plain English text, program code, XML definitions to even audio/video. … Related articles

A noun-based approach to feature location using time-aware term-weighting S Zamani, SP Lee, R Shokripour, J Anvik – Information and Software …, 2014 – Elsevier Feature location aims to identify the source code location corresponding to the implementation of a software feature. Many existing feature location methods app. Cited by 1 Related articles All 2 versions