Snowball Stemmer 2015


 Snowball (programming language) {related:}

Notes:

Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.  Snowball is a small string processing programming language designed for creating stemming algorithms for use in information retrieval.  Semantic annotation is the process of annotating resources with semantic metadata.

  • Knowledge harvesting

Resources:

Wikipedia

References:

See also:

Porter Stemmer | Stemmer & Dialog Systems | Stemming Algorithms


Using noun phrases extraction for the improvement of hybrid clustering with text-and citation-based components. The example of “Information System Research B Thijs, W Glänzel, M Meyer – Proc. of the Workshop Mining Scientific …, 2015 – kar.kent.ac.uk … Stop words are removed through a custom built stop word list and remaining terms were stemmed by the Snowball Stemmer available in Lucene which is an extended version of the original Porter Stemmer (Porter, 1980). All terms that occur in only one document are removed. … Cited by 3 Related articles All 5 versions

An Overview of Pre-Processing Text Clustering Methods. D Sailaja, MV Kishore, B Jyothi, N Prasad – International Journal of …, 2015 – Citeseer … Some examples of the rules include: • if the word ends in ‘ed’, remove the ‘ed’ • if the word ends in ‘ing’, remove the ‘ing’ • if the word ends in ‘ly’, remove the ‘ly’ Snowball Stemmer: Snowball is a small string processing programming language designed for creating stemming … Cited by 1

SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data C Pang, A Sollie, A Sijtsma, D Hendriksen… – …, 2015 – database.oxfordjournals.org … Example of how to upload a coding system and a coding/recoding target. To match data values efficiently, we used the Lucene search index with the default snowball stemmer and a standard filter for stemming and removing stop words. … Cited by 1 Related articles All 8 versions

Detecting Events and Sentiment on Twitter for Improving Urban Mobility. A Candelieri, F Archetti – ESSEM@ AAMAS, 2015 – ceur-ws.org … they are not considered. Furthermore, the impact of applying – or not – stemming has also been considered, by using Snowball Stemmer (http://trimc-nlp.blogspot. it/2013/08/snowball-stemmer-for-java.html). The following pre … Cited by 1 Related articles All 5 versions

A simple algorithm for the problem of suffix stripping BP Pande, P Tamta, HS Dhami – International Journal of …, 2015 – Wiley Online Library Skip to Main Content. Wiley Online Library. Log in / Register. Log In E-Mail Address Password Forgotten Password? Remember Me. … Cited by 2 Related articles All 2 versions

Sarcasm Detection in Social Media A Signhaniya, G Shenoy, R Kondekar – rohitkondekar.github.io … We also tried generating Ngrams after stemming each word using snowball stemmer. … We used Textblob library. Stemmed N- Grams Each word is stemmed using conservative SnowBall stemmer and then used as ngrams fea- tures. Table 3: Description of word feature set … Related articles

Semantic annotation toolkit (first version) I Roberts, F Axelsson, Z Varju, L Dolamic, C Boyer… – kconnect.eu … the Stanford tagger. For morphological analysis we tried the Snowball stemmer provided with GATE, and also the Morfette tool9, for which a custom GATE wrapper needed to be written. 4.2 Stopword list Two different French … Related articles

Towards Knowledge-Intensive Software Engineering SR Cauvin, D Sleeman… – Proceedings of the …, 2015 – pdfs.semanticscholar.org … As stemming is a fairly common opera- tion, a stemmer tool was sought to avoid reimplementation. The Snowball Stem- mer5, specifically the Modified Snowball Stemmer6 which adds compatibly for Java 1.6 and up, was chosen as it returns a single stem. … Related articles All 2 versions

Suffix Stripping Problem as an Optimization Problem P Tamta, BP Pande, HS Dhami – Glottotheory, 2015 – degruyter.com … The effectiveness of current work is being concluded on the basis of the outputs obtained for 100 randomly 1 chosen English words, and comparing them with outputs of Porter’s Snowball stemmer (Porter 2001) (Appendix A.1). We are comparing our outputs with that of Porter’s … Related articles

Teaching the IR Process Using Real Experiments Supported by Game Mechanics T Wilhelm-Stein, M Eibl – International Conference of the Cross-Language …, 2015 – Springer … Experiments are set up by configuring two processes: Indexing and retrieval. The user has to choose from a variety of components to create an index and search it. – Lower case filter, – Snowball stemmer, – n-Gram stemmer, and – Stop words removal … Related articles

Simulated and Self-Sustained Classification of Twitter Data based on its Sentiment SN Vinithra, SJA Selvan, MA Kumar… – Indian Journal of …, 2015 – search.proquest.com … KH coder made with Perl language and utilizes R, MySQL, GATE POS tagger for Twitter and snowball stemmer as backside tool1. … Step 3: – Stemming – by snowball stemmer. For eg Stemming include snooping (gerund), snoops (verb) from snoop (root word). … Related articles All 3 versions

Language Resources and Linked Data: A Practical Perspective T Flati, C Baron, M Dojchinovski – Knowledge Engineering and …, 2015 – books.google.com … eu20, Snowball Stemmer and OpenNLP21. The following URL exemplifies the annotation of the string “I’m connected.” using the Snowball Stemmer NIF-WS implementation. http://snowball. nlp2rdf. aksw. org/snowball? f= text&i= I’m+ connected. … Related articles

Source code retrieval on StackOverflow using LDA A Arwan, S Rochimah, RJ Akbar – … Technology (ICoICT), 2015 …, 2015 – ieeexplore.ieee.org … We employ Snowball stemmer, since this tool already widely used by many researchers[12,13,16] to stemming word. … For stemming we also create java application using Snowball stemmer library (we choose porter algorithm). … Related articles

Obtaining SMT dictionaries for related languages M Rios, S Sharoff – ACL-IJCNLP 2015, 2015 – anthology.aclweb.org … target word wt. The rs and rt are the stems produced by the Snowball stemmer2. Since the Snowball stemmer does not support Ukrainian and Bulgarian, we used the Russian model for making the stem/ending split. es, et are … Related articles All 7 versions

Stack Exchange Tagger S Mehta, S Sodhani – arXiv preprint arXiv:1512.04092, 2015 – arxiv.org … A lot of algorithms are available for stemming. The prominent ones include the porter stemmer, the snowball stemmer and the lancaster stemmer. Porter stemmer is the most comman algorithm and consists of 5 phases of word reduction that are applied sequentially. … Related articles All 3 versions

Corpus based evaluation of stemmers I Endrédy – 2015 – real.mtak.hu … Basically it is used for spell checking, but it can lemmatize as well. Hunspell lexicons were created for several languages. The Snowball stemmer is an algorithmical one, it can handle many languages, and it is available in several ap- plications. … Related articles All 3 versions

A Framework for the Analysis of Electronic Health Record Narrative Texts based on Big Data Technologies WA Calderón, AP Quimbaya, RA Gonzalez, OM Munoz – 2015 – researchgate.net … Lemmatisation Snowball: It reduces a word to its lemma, root or canonical form using the snowball stemmer algorithm. … (a) libstemmer-1.0.jar: Library with the Snowball Stemmer implementation. (b) stanford-corenlp-3.4.1.jar: Natural Language Processing library. … Related articles

Stemming as a preprocessing step for automatic document timestamp inference L Leppänen – clt.loezi.fi … string “NUMERAL”. The second preprocessor “Snowball” utilizes the Java version (Boulton, 2002) of the Snowball stemmer by Porter (2001). The preprocessor first … Future work This study used the Snowball stemmer (Porter, 2001; Boulton, 2002) which … Related articles

Information processing and management using citation network and keyword analysis to perform a systematic literature review on Green Supply Chain … F Strozzi, C Colicchia – Journal of scientometric research, 2015 – jscires.org … The process of normalization implemented in Sci2 separates the text into tokens word, normalizes them in lowercase, it removes the s at the end of words, remove the dots from acronyms, delete the stop words, and applies the English Snowball stemmer (http://snowball.tartarus … Cited by 1 Related articles All 4 versions

Supervised Named Entity Recognition for Clinical Data D Jain – 2015 – ceur-ws.org … 1. Microsoft Bing translator was used for translating each french word (sepa- rately) into English. 2. Snowball stemmer was used for the stemming during pre-processing step 3. CRFSuite tagger was used to train the model based on the traing data and for tagging the test files. … Cited by 2 Related articles All 3 versions

A Study on Search Term Based Recommendations S Shin – dev02.dbpia.co.kr … endings were removed. For English, Stopwords provided by NLTK was matched for removal, and only the term ending was extracted using the Snowball Stemmer library. 3) For terms with high term Page 2. 172 frequency such …

Classification and Selection of Translation Candidates for Parallel Corpora Alignment KM Kavitha, L Gomes, J Aires, JGP Lopes – Portuguese Conference on …, 2015 – Springer … checks if the terms in the key-word tree7 occur as (sub- )expressions in the bilingual pair to be validated and if they occur are accepted translations [9]. Similarly, to find the stemmed coverage, we use the stemmed training and test datasets, obtained using the Snowball stemmer … Related articles

DBLPminer: A Tool for Exploring Bibliographic Data T Le, D Zhang – Information Reuse and Integration (IRI), 2015 …, 2015 – ieeexplore.ieee.org … and results. DBLPminer uses the English Snowball stemmer [4] for bibliography indexing. IV. … Stemming operations are performed using the English Snowball Stemmer via PyStemmer library. Stopword removal is assisted by a text file containing a line delimited list of stopwords. … Related articles All 2 versions

Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval D Ferrés, H Rodríguez – … Symposium on String Processing and Information …, 2015 – Springer … based IR). They used POS tagging (TreeTager), stopwords filtering and Snowball Stemmer to process a thematic index of the collection. A geographical index was built with Geo- NER to recognize geographical entities. The … Cited by 1 Related articles All 2 versions

Tag Recommender for StackOverflow Questions MK Moharana – pdfs.semanticscholar.org … were re- moved. I tried using word stemming to reduce words to their basic form using the NLTK [Loper and Bird 2002] Snowball stemmer, but didn’t use it in the end due to slight decrease in cross-validation accuracy. The body … Related articles All 2 versions

Prompter L Ponzanelli, G Bavota, M Di Penta, R Oliveto… – Empirical Software …, 2015 – Springer … Empir Software Eng To better understand this process, we show an example of query creation. Listing 1 shows a Java method from which the Query Service has to extract a query. The method is making use of a library applying the Snowball Stemmer4 on a set of tokens. … Related articles All 3 versions

A survey on semantic document clustering MP Naik, HB Prajapati, VK Dabhi – Electrical, Computer and …, 2015 – ieeexplore.ieee.org Page 1. A Survey on Semantic Document Clustering Maitri P. Naik Departm en t of In formati on Technology, Dharmsinh Desai University, Nadiad, India. maitri.naik@ gmail.com Harshadkumar B. Prajapati Department of Information … Cited by 1 Related articles All 2 versions

Automatic indexing of journal abstracts with latent semantic analysis JR Adams, S Bedrick – International Conference of the Cross-Language …, 2015 – Springer … sentences and then words. As part of pre-processing, we remove common words (such as ‘or’ and ‘not’) found in the standard NLTK English word list, and then apply the NLTK implementation of the Snowball stemmer. We next use … Related articles All 2 versions

Package ‘tm’ I Feinerer, K Hornik, MI Feinerer – Corpus, 2015 – safesteps.com Page 1. Package ‘tm’ January 13, 2014 Title Text Mining Package Version 0.5-10 Date 2014-01-07 Depends R (>= 3.0.0) Imports parallel, slam (>= 0.1-31) Suggests filehash, proxy, Rcampdf, Rgraphviz, Rpoppler, SnowballC, XML … Cited by 11 Related articles All 255 versions

C2F: A Clustering Based Collaborative Filtering approach for recommending product to ecommerce user R Gowri, A Kumar, MJ Arvind – Computation of Power, …, 2015 – ieeexplore.ieee.org … In order to avoid negative gain over Explanation similarity among services, different stemming algorithm could be deployed. Among different stemming algorithm (The Lovins Stemmer, The Dawson Stemmer, Porter Stemmer, Snowball Stemmer etc.). …

Semi-supervised multi-view sentiment analysis G Lazarova, I Koychev – Computational Collective Intelligence, 2015 – Springer … English views the following pipeline was created: Document Reset – it enables the document to be reset to its original state, by remov- ing all the annotation sets and their contents; ANNIE Tokeniser – splits the text into very simple tokens; Snowball stemmer /BulStem – English … Cited by 2 Related articles All 2 versions

Tag cloud visualisation of verbal discussions following speech-to-text R Visser, BOK Intelligentie – 2015 – esc.fnwi.uva.nl … Once more, a NLTK package was used: stem. This package allows for one to choose which stemmer they want to make use of. The original, most commonly used Porter stemmer is available, but also the Snowball stemmer, which was previously mentioned in section 2.4. … Related articles

Clinical abbreviation disambiguation using neural word embeddings Y Wu, J Xu, Y Zhang, H Xu – ACL-IJCNLP 2015, 2015 – aclweb.org … We used the Snowball Stemmer from the python NLTK (Natural Lan- guage Toolkit) package to stem the words; 2). Word feature with direction-The relative direction (left side or right side) of stemmed words in feature set 1 towards the target abbrevi- ation; 3). Position feature … Cited by 6 Related articles All 8 versions

How can i improve my app? classifying user reviews for software maintenance and evolution S Panichella, A Di Sorbo, E Guzman… – … (ICSME), 2015 IEEE …, 2015 – ieeexplore.ieee.org … two steps: 1) Preprocessing: all terms contained in our set of user reviews are used as an information base to build a textual corpus that is preprocessed applying stop-word removal (using the english standard stop-word list) and stemming (English Snowball Stemmer) to reduce … Cited by 18 Related articles All 3 versions

TwitterHawk: A Feature Bucket Approach to Sentiment Analysis W Boag, P Potash, A Rumshisky – SemEval-2015, 2015 – anthology.aclweb.org … lowercased. Next, we replace URLs, user mentions, and numbers with generic URL, USER, and NUMBER tokens, respec- tively. The remaining tokens are stemmed using NLTKs Snowball stemmer (Bird et al., 2009). We also … Cited by 3 Related articles All 12 versions

A Semantic Approach To Analyze Scientific Paper Abstracts IC PARASCHIV, ? TR?U?AN, P DESSUS… – … of» eLearning and …, 2015 – ceeol.com … found inside a dictionary. After tokenization and splitting, the text is inserted into computational vectors, which are reduced to their morphological unit using stemming (ie, Snowball stemmer [12]). Finding named entities inside … Related articles All 12 versions

Comparing summarisation techniques for informal online reviews M McNeill, R Raeside, M Graham… – … (IC3K), 2015 7th …, 2015 – ieeexplore.ieee.org … Several cleaning steps are common to both: all text was converted to lowercase, punctuation was removed. Words were stemmed using the Snowball stemmer. Stopwords on the SMART stopword list were removed (Buckley 1985). Also, sparse words were removed. …

Classification of drugs reviews using W-LRSVM model AS Manek, K Pandey, PD Shenoy… – 2015 Annual IEEE …, 2015 – ieeexplore.ieee.org … In this process, a series of sub tasks are involved as follows: Each reviews is completely read and words occurring in the review undergo tokenization, case transformation, stemming using Porter’s algorithm and Snowball stemmer operators and finally all the English stop words … Related articles

Sentiment Analysis on Social Media and Online Review R Singh, R Kaur – International Journal of Computer …, 2015 – search.proquest.com … stem. For example, talked, talking, talks as based on the root word talk. We have used Snowball stemmer to reduce the derived word to their origin. Now for evaluating the result, different parameter are to be calculated. True … Related articles All 3 versions

It’s a man’s wikipedia? assessing gender inequality in an online encyclopedia C Wagner, D Garcia, M Jadidi, M Strohmaier – arXiv preprint arXiv: …, 2015 – arxiv.org … 2013). An open-vocabulary ap- proach is not limited to predefined word lists, but linguis- tics are automatically determined from the text. We compute the tfidf scores of the word stems obtained from a Snowball Stemmer and use them as features to train a Naive Bayes classifier. … Cited by 19 Related articles All 8 versions

Improving Information Acquisition via Text Mining for Efficient E-Governance AB Adeyemo, AK Ojo – 2015 – dline.info … Stopwords, such as ‘a’, ‘an, ‘and’, ‘the’, and so on, were also removed/filtered from the entire document. The remaining tokens were finally stemmed down using Snowball stemmer. These were then clustered together using K-means clustering algorithm. … Related articles

OntoMate: a text-mining tool aiding curation at the Rat Genome Database W Liu, SJF Laulederkind, GT Hayman… – …, 2015 – database.oxfordjournals.org … ANNIE is a set of basic information extraction libraries. We exported ontology terms from the RGD database and stemmed them using Snowball Stemmer to build term dictionaries. Article texts are also stemmed before running through the pipeline. … Cited by 3 Related articles All 10 versions

[BOOK] NLTK essentials N Hardeniya – 2015 – books.google.com Page 1. C ommumity E. xperience D isti ! I ed NLTK Essentials Build cool NLP and machine learning applications using NLTK and other Python libraries Nitin Hardeniya PACKT open source” FL ELL SHMG Page 2. NLTK Essentials … All 6 versions

Understanding student language: An unsupervised dialogue act classification approach A Ezen-Can, KE Boyer – JEDM-Journal of Educational …, 2015 – educationaldatamining.org … the vocabulary of the corpus under consideration. We use the Snowball stemmer in this work1. Another consideration for preprocessing raw natural language dialogue data is how to repre- sent special entities within the utterances. … Cited by 4 Related articles All 7 versions

A Bayesian approach for incorporating expert opinions into decision support systems: A case study of online consumer-satisfaction detection K Coussement, DF Benoit, M Antioco – Decision Support Systems, 2015 – Elsevier … matrix. An example is the word “inspect,” which is the stem for the variants “inspected,” “inspecting,” “inspection,” and “inspections.” We used the Snowball Stemmer, which is the most well-known, affix-removal stemmer [51]. … Cited by 4 Related articles All 6 versions

Overview of the cancer genetics and pathway curation tasks of bionlp shared task 2013 S Pyysalo, T Ohta, R Rak… – BMC …, 2015 – bmcbioinformatics.biomedcentral. … Skip to main content. Advertisement. Biomed Central logo Menu. Search Search. Publisher main menu. … Cited by 5 Related articles All 9 versions

A Hybrid Approach to General Information Extraction M Grap – 2015 – pdfs.semanticscholar.org … This project uses the output of the part-of-speech tagger to help with the generation of rules and features from text. • Stemmer: This project uses NLTK’s Snowball stemmer to make words more generic by reducing them to their roots. … Related articles All 2 versions

Model-dependent software evaluation of text-processing tools T Kaspers – 2015 – kola.opus.hbz-nrw.de … languages. The Snowball Stemmer4 is the stemmer which is suggested by Martin Porter since it derived from the Porter Stemmer and has a higher efficiency. The Snowball Stemmer is available for different languages 3http://tartarus … Related articles All 2 versions

On Using Statistical Semantic on Domain Specific Information Retrieval N Rekabsaz – publik.tuwien.ac.at … However, stemmers are typically easier to implement and also faster to run, but as the examples show, they have less accuracy in comparison to lemmatization. There exist a variety of English stemmers, such as Porter stemmer, Lancaster Stemmer, and Snowball Stemmer [35]. … Related articles

Report On The 20Th STI Conference AA Salah – 2015 – issi-society.org Page 1. © International Society for Scientometrics and Informetrics 64 ISSI e-Newsletter (ISSN 1998-5460) is published by ISSI (http://www.issi-society.org/). Contributors to the newsletter should contact the editorial board by e-mail. … Related articles

Real-Time Topic and Sentiment Analysis in Human-Robot Conversation E Russell – 2015 – epublications.marquette.edu … package NLTK provides a wide variety of text-processing tools to accomplish these tasks [61]; the tools used in this project are described here. NLTK’s Snowball Stemmer is used on lists of words to remove stopwords and to obtain stemmed versions of non-stopwords. … Related articles All 2 versions

Characterizing the discussion of antibiotics in the Twittersphere: what is the bigger picture? RL Kendra, S Karki, JL Eickholt… – Journal of medical Internet …, 2015 – ncbi.nlm.nih.gov … presence of a URL. Specifically, the text of each tweet was stemmed using a Snowball stemmer included in the Natural Language Toolkit [32] and the presence or absence of several common stems was encoded. To identify the … Cited by 3 Related articles All 7 versions

Mining User Opinions in Mobile App Reviews: A Keyword-Based Approach (T) PM Vu, TT Nguyen, HV Pham… – … (ASE), 2015 30th IEEE/ …, 2015 – ieeexplore.ieee.org … table of irregular verbs [13]. Otherwise, stemming rules in Table VI will be applied directly on the word. Those rules are designed based on English grammar and some rules of the Snowball stemmer. As seen in the table, we … Cited by 7 Related articles All 4 versions

Translation memory retrieval methods M Bloodgood, B Strauss – arXiv preprint arXiv:1505.05841, 2015 – arxiv.org … We call the resulting set the valid segments. For the pur- pose of computing match statistics, for French cor- pora we remove all punctuation, numbers, and sci- entific symbols; we case-normalize the text and stem the corpus using the NLTK French snowball stemmer. … Cited by 6 Related articles All 13 versions

Document clustering with nonparametric hierarchical topic modeling KH Schaefer – 2015 – repositories.lib.utexas.edu … ‘appl’. For Python, the nltk.stem package1 provides a variety of stemming algorithms, specifically Porter’s Snowball Stemmer (Porter, 2001). This stemmer not only has a low error rate compared to other truncators, but was built from an algorithm that … Related articles

How Automatic Coding of Short Text Responses Is Able to Enhance Assessment F Zehuer, C Saelzer, F Goldhammer – Annual Conference of the …, 2015 – researchgate.net Page 1. How Automatic Coding of Short Text Responses Is Able to Enhance Assessment AERA, Chicago, IL – April 17, 2015 Zehner, F.?‡, Sälzer, C.?‡ & Goldhammer, F. ‡ (in press). Automatic Coding of Short Text Responses via Clustering in Educational Assessment. … Cited by 1 Related articles

Moody music generator: Characterising control parameters using crowdsourcing M Scirea, MJ Nelson, J Togelius – … and Biologically Inspired Music and Art, 2015 – Springer … steps on the data. First, we stem the words using the Snowball stemmer, 2 in order to aggregate minor part-of-speech variations of labels—for example, relaxed and relaxing are both mapped to the stem relax. In addition, we … Cited by 4 Related articles All 9 versions

Smirking or Smiling Smileys?: Evaluating the Use of Emoticons to Determine Sentimental Mood E Lousseief, T Hindersson – 2015 – diva-portal.org Page 1. DEGREE PROJECT, IN , FIRST LEVEL COMPUTER SCIENCE STOCKHOLM, SWEDEN 2015 Smirking or Smiling Smileys? EVALUATING THE USE OF EMOTICONS TO DETERMINE SENTIMENTAL MOOD TOBIAS HINDERSSON AND ELIAS LOUSSEIEF … Related articles

Localization of real world regression Bugs using single execution D Cohen, A Yehudai – arXiv preprint arXiv:1505.01286, 2015 – arxiv.org … taxonomy tree. The tree represents relations such as synonymy and hypernymy. See more details in [10]. 4) For other terms, that are not covered by WordNet, we use Snowball stemmer to determine equivalence. 5) WordNet … Related articles All 4 versions

Sequential pattern mining techniques applied to the prediction of financial events: Mercurio project M MORLACCHI BONFANTI, S MARCHESINI – 2015 – politesi.polimi.it Page 1. Politecnico di Milano Scuola di Ingegneria Industriale e dell’Informazione Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria Sequential pattern mining techniques applied to the prediction of financial events: … Related articles All 2 versions

Simple learning and compositional application of perceptually grounded word meanings for incremental reference resolution C Kennington, D Schlangen – Proceedings of the Conference …, 2015 – anthology.aclweb.org … sr? RD.ST (r) • how well does the rr model work (together with the sr)? RD.ST with DD.ST (rr) Words were stemmed using the NLTK (Loper and Bird, 2002) Snowball Stemmer, reducing the 296 Page 6. vocabulary size to 1306. … Cited by 18 Related articles All 8 versions

Mixed language Arabic-English information retrieval M Mustafa, H Suleman – … Conference on Intelligent Text Processing and …, 2015 – Springer … Kwok’s approximation, was used in all experiments. Arabic words were lightly stemmed using the LIGHT-10 stemmer, whereas the English words were stemmed by the SNOWBALL stemmer. In the experiments, an extension of … Cited by 1 Related articles All 2 versions

Building a Scientific Concept Hierarchy Database (SCHBASE) E Adar, S Datta – Ann Arbor – Citeseer … We further normalize our keyphrases by lowercasing, removing hyphens, and using the Snowball stemmer (Porter, 2001) to merge plu- ral variants. After stemming and normalizing, we found a total of 155,957 unique abbreviation ex- pansions. … Cited by 2 Related articles All 10 versions

An approach towards semi-automated biomedical literature curation and enrichment for a major biological database F Rinaldi, O Lithgow-Serrano, A López-Fuentes… – Polibits, 2015 – scielo.org.mx … different strategies. In the current version, the processing sequence is the following: – Apply Part-Of-Speech (POS) tagging using the Stanford POS tagger [15]. – Apply stemming using the snowball stemmer [16]. – Apply a rule … Cited by 1 Related articles All 6 versions

Analyzing the Semantic Relatedness of Paper Abstracts: An Application to the Educational Research Field IC Paraschiv, M Dascalu… – … on Control Systems …, 2015 – ieeexplore.ieee.org … After the text is split and tokenized into vectors of words, all the components are reduced to their morphological unit using stemming (ie, Snowball stemmer [15]) and their dictionary base form by using lemmatization. Another … Cited by 3 Related articles All 5 versions

Topic aspect-oriented summarization via group selection H Fang, W Lu, F Wu, Y Zhang, X Shang, J Shao… – Neurocomputing, 2015 – Elsevier The summarization is desirable to efficiently apprehend the gist of the huge amount of data and becomes a significant challenge in many applications such as new. Cited by 8 Related articles All 2 versions

Mwi-sum: A multilingual summarizer based on frequent weighted itemsets E Baralis, L Cagliero, A Fiori, P Garza – ACM Transactions on …, 2015 – dl.acm.org … Furthermore, a stemming algorithm is applied to reduce document words to their base or root form (ie, the stem). More specifically, the Snowball stemmer [Bird et al. 2009] is used for the English language, while the Lucene stemmer [McCandless et al. … Cited by 5 Related articles

Text Mining for Studying Management’s Confidence in IPO Prospectuses and IPO Valuations A Deokar, J Tao – 2015 – aisel.aisnet.org … as follows. The Snowball stemmer is used to pick up the stem of each word; while APOLDA and onto- gazetteers are used to provide the semantic annotations based on the terms contained in the prospectus ontology. Next, the … Related articles

Information Seeking and Responding Networks in Physical Gatherings: A Case Study of Academic Conferences in Twitter X Wen, YR Lin – Proceedings of the 2015 ACM on Conference on …, 2015 – dl.acm.org … Then, we applied Snowball Stemmer for word stemming. Feature Engineering. We used N-grams as our fea- tures. N-gram is a sequence of terms in the text and has been proved to be useful features in text classification tasks [21, 25, 40]. … Related articles All 2 versions

Detecting Risks in the Banking System by Sentiment Analysis C Nopp, A Hanbury – Proceedings of the EMNLP 2015}, 2015 – cs.cmu.edu … experiment. Hence, all words which do not appear in the first experi- ment’s dictionaries are removed. The second ap- proach utilizes a Snowball Stemmer to ensure that different versions of the same word are treated as equal. Its … Related articles All 10 versions

Wikipedia based Query Expansion for Searching in Norwegian LPT Johnsen – 2015 – bora.uib.no … for the rest of the document has been used. This list is compiled by the creators of the snowball stemmer, and revised by Jan Bruusgaard (Bruusgaard, 2005) The Snowball stemmer will be explained in section 2.3.3. There … Related articles All 3 versions

Reconciling heterogeneous descriptions of language resources JP McCrae, P Cimiano, VR Doncel, D Vila-Suero… – ACL-IJCNLP …, 2015 – aclweb.org … First we tokenized the ex- pressions, then we stemmed the tokens using the Snowball stemmer (Porter, 2001), and we per- formed a string inclusion match, ie checking whether META-SHARE usages are included in the free text entries. … Cited by 4 Related articles All 11 versions

Mining text-enriched heterogeneous information networks M Gr?ar – Jo¿ ef Stefan International Postgraduate School, 2015 – kt.ijs.si Page 1. MINING TEXT-ENRICHED HETEROGENEOUS INFORMATION NETWORKS Miha Gr?ar Page 2. Doctoral Dissertation Jožef Stefan International Postgraduate School Ljubljana, Slovenia, June 2015 Evaluation Board: Asst. Prof. … Cited by 3

On the influence of training data quality on text document classification using machine learning methods J Saarikoski, H Joutsijoki, K Jarvelin… – … and Data Mining, 2015 – inderscienceonline.com … First, all digits and special characters and symbols were deleted from the text data by replacing them with whitespace characters. After that, the SNOWBALL stemmer (Porter, 2001) was run to transform words to their stems. For instance, the word ‘thinking’ was stemmed to ‘think’. … Related articles All 2 versions

Subsegment recall in Translation Memory–perceptions, expectations and reality K Flanagan – The Journal of Specialised Translation, 2015 – jostrans.org … The most frequent N-grams were examined manually to select suitable candidates for subsegment recall testing, eliminating N-grams with too many stop words, using the stop-word lists for French and English included in the Snowball stemmer suite (Porter 2001). … Cited by 3 Related articles All 2 versions

Computation And Evolutionary Algorithms Module II: Classification Of Customer Reviews Using Deep Learning R Ghosh, V Ravi – 2015 – idrbt.ac.in … In order to obtain the document-term matrix, StringToWordVector filter of Weka 3.7.11 was used, which performs tokenization on the basis of specified punctuation marks. Snowball stemmer was used for stemming [33] that brings words into its root word. …

Predicting Opinion Leaders in Word-of-Mouth Communities G Towhidi, AP Sinha – 2015 – aisel.aisnet.org … The baseline model is the bag-of-words model. We applied the standard preprocessing steps to prepare the sample corpus for the bag-of-words model. We used the snowball stemmer, stopword eliminator, and word tokenizer methods. … Related articles

Sentiment Analysis in monitoring software development processes: An exploratory case study on GitHub’s project issues F Jurado, P Rodriguez – Journal of Systems and Software, 2015 – Elsevier … As technical notes, for text processing we used NLTK (Bird et al. 2009). For the stemming process we used the SnowBall stemmer (Porter, 2001), which implements the well-known and widely used Porter’s algorithm. This is … Cited by 7 Related articles All 3 versions

Decoding data analytics capabilities from topic modeling on press releases JC Bonilla, B Rao – 2015 Portland International Conference on …, 2015 – ieeexplore.ieee.org … For stopwords, we applied the stopwords list from the SMART [22] information retrieval system as well as the Snowball stemmer project [23]. These sources list words that have no semantic meaning and contribution to text analysis. … Related articles All 2 versions

Extracting biomedical events from pairs of text entities X Liu, A Bordes, Y Grandvalet – BMC …, 2015 – bmcbioinformatics.biomedcentral. … Skip to main content. Advertisement. Biomed Central logo Menu. Search Search. Publisher main menu. … Related articles All 13 versions

Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library B Dit, E Moritz, M Linares-Vásquez… – Empirical Software …, 2015 – Springer Cited by 4 Related articles All 7 versions

Package ‘lsa’ F Wild – 2015 – 137.132.33.20 … docu- ments. If specified, simple text preprocessing mechanisms are applied (stemming, stopword filter- ing, wordlength cutoffs). Stemming thereby uses Porter’s snowball stemmer (from package SnowballC). There are two … Related articles All 230 versions

Compound matching of biomedical ontologies DPS Oliveira – 2015 – repositorio.ul.pt Page 1. UNIVERSIDADE DE LISBOA FACULDADE DE CIÊNCIAS DEPARTAMENTO DE INFORMÁTICA Compound Matching of Biomedical Ontologies Mestrado em Bioinformática e Biologia Computacional Especialização em Bioinformática … Cited by 1 Related articles All 6 versions

Information retrieval from historical newspaper collections in highly inflectional languages: A query expansion approach A Järvelin, H Keskustalo, E Sormunen… – Journal of the …, 2015 – Wiley Online Library … A stemming test conducted showed that the Snowball stemmer for Finnish reduced the number of unique index words in the collection from 7.03 million to 4.87 million (to 69.3% of the original). The corresponding conflation rate … Cited by 4 Related articles All 4 versions

Textual Concept Similarity N Rekabsaz, R Bierig, AL Ginsca, M Lupu, A Popescu… – ifs.tuwien.ac.at … viding the weight of each word by the sum of the scores of all words in the document. Monolingual collections are stemmed using the corresponding Perl Snowball stemmer implementation. 3.4 ENRICHMENT AND RETRIEVAL FRAMEWORK … Related articles All 2 versions

BibSLEIGH: Bibliography of Software (Language) Engineering in Generated Hypertext V Zaytsev – 2015 – grammarware.net … the results of such harvest to public repositories to avoid copyright claims). A word is what we call a stem obtained from a classic Snowball stemmer for English. We use our own lexer that tries to split camelcased words properly … Related articles All 2 versions

Secure Sketch Search for Document Similarity C Orencik, M Alewiwi, E Savas – Trustcom/BigDataSE/ISPA, …, 2015 – ieeexplore.ieee.org … files. The size each file changes from 4 KB to 2 MB. First, the data is cleaned from the mail headers and stop words. The terms are stemmed to their roots using the snowball stemmer included in the Lucene [13] library. We then … Related articles All 4 versions

Fine-tuning SIMPLE based Content Based Image Retrieval system VH Vu, HS Le, O Kanishcheva… – Proceedings of the Sixth …, 2015 – dl.acm.org … clusters coincide with the human defined clusters. To calculate similarity based on textual features, first, all keywords are normalized. We deleted stop words and used stemming (Snowball Stemmer). After this we split the tags into … Related articles

Speaking German, Talking Differently? Using SNA to Uncloak Thematic Pathways in German, Swiss and Austrian New Year’s Addresses K Elo – Work, 2015 – researchgate.net … and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.” (http:// www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ [online], last visited: June 11, 2015) 5I used the German stop word list from the Snowball stemmer project (http …

Approaches to Automatic Text Structuring N Erbs – 2015 – tuprints.ulb.tu-darmstadt.de Page 1. Approaches to Automatic Text Structuring Vom Fachbereich Informatik der Technischen Universität Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades Dr.-Ing vorgelegt von M.Sc. Nicolai Erbs geboren in Scherzingen (Schweiz) … Related articles

SMUG: Scientific music generator M Scirea, GAB Barros, N Shaker… – Proceedings of the Sixth …, 2015 – axon.cs.byu.edu … Computational Creativity, Concept In- vention, and General Intelligence 1:21. Porter, M., and Boulton, R. 2001. Snowball stemmer. Shaker, N.; Togelius, J.; and Nelson, MJ 2014. Procedural Content Generation in Games: A Textbook and an Overview of Current Research. … Cited by 3 Related articles All 7 versions

Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks Bogdan Dit Cluj-Napoca, Cluj, Romania Master of Science, Wayne State University, 2009 Bachelor of Science, Babe?-Bolyai University (Romania), 2006 … Related articles All 2 versions

Price, Perceived Value and Customer Satisfaction: A Text-Based Econometric Analysis of Yelp! Reviews EA Dwyer – 2015 – scholarship.claremont.edu … from its review texts, restricting our dictionary to only the 10,000 most com- mon stemmed words in the corpus, after stemming words with Porter’s Snowball 19 Page 25. Stemmer as implemented in the Natural Language Toolkit python library. To … Related articles

Aspect learning for multimedia summarization via non-parametric Bayesian F Wu, H Fang, X Li, S Tang, W Lu, Y Yang, W Zhu… – 2015 – ieeexplore.ieee.org Page 1. 1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See http://www.ieee.org/ publications_standards/publications/rights/index.html for more information. This … Related articles

Online defect prediction for imbalanced data M Tan, L Tan, S Dara, C Mayeux – Proceedings of the 37th International …, 2015 – dl.acm.org … The bag- of-words feature is a vector of the count of occurrences of each word in the text. We use the snowBall stemmer [23] to group words of the same root and Weka [24] to obtain the bag-of-words features from both the commit messages and the source code. … Cited by 15 Related articles All 9 versions

Legal retrieval as support to eMediation: matching disputant’s case and court decisions S El Jelali, E Fersini, E Messina – Artificial Intelligence and Law, 2015 – Springer Related articles All 8 versions

Machine translation for human translators M Denkowski – 2015 – lti.cs.cmu.edu Page 1. Machine Translation for Human Translators Michael Denkowski CMU-LTI-15-004 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: … Cited by 4 Related articles All 8 versions

On the application of Focused Crawling for Statistical Machine Translation Domain Adaptation VP Moreira – 2015 – lume.ufrgs.br Page 1. UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL INSTITUTO DE INFORMÁTICA PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO BRUNO REZENDE LARANJEIRA On the application of Focused Crawling for Statistical Machine Translation Domain … Related articles

Automated Quality Assurance of Non-Functional Requirements for Testability A Rashwan – 2015 – spectrum.library.concordia.ca Page 1. AUTOMATED QUALITY ASSURANCE OF NON-FUNCTIONAL REQUIREMENTS FOR TESTABILITY ABDERAHMAN RASHWAN A THESIS IN THE DEPARTMENT OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING … Related articles All 3 versions

Evaluation of methods for the assessment of minimum required attention K Kircher, C Ahlström – 2015 – diva-portal.org Page 1. Katja Kircher Christer Ahlström Evaluation of methods for the assessment of minimum required attention VTI rapport 872A | Evaluation of methods for the assessment of minimum required attention www.vti.se/en/publications VTI rapport 872A Published 2015 Page 2. … Related articles All 3 versions

Word Sense Disambiguation with GermaNet V Henrich – 2015 – bibliographie.uni-tuebingen.de Page 1. Word Sense Disambiguation with GermaNet Semi-Automatic Enhancement and Empirical Results Dissertation zur Erlangung des akademischen Grades Doktor der Philosophie in der Philosophischen Fakultät der Eberhard Karls Universität Tübingen vorgelegt von … Related articles All 2 versions