## Snowball Stemmer 2017

Notes:

Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.  Snowball is a small string processing programming language designed for creating stemming algorithms for use in information retrieval.  Semantic annotation is the process of annotating resources with semantic metadata.

• Knowledge harvesting

Resources:

Wikipedia

References:

Using word embeddings in twitter election classification
X Yang, C Macdonald, I Ounis – Information Retrieval Journal, 2017 – Springer
… Therefore, we obtain a dataset with good quality human labels. In total, our Venezuela election dataset consists of 5747 Spanish tweets, which contains 9904 unique words after preprocessing (stop-word removal & Spanish Snowball stemmer) …

Predicting the best system parameter configuration: the (per parameter learning) ppl method
J Mothe, M Washha – Procedia Computer Science, 2017 – Elsevier
… On the other hand, Terrier platform that we used in the experimental part implements several stemmers such as Porter Stemmer (PS), English Snowball Stemmer(ESS); thus in our experiments, we consider the stemmer algorithm as a system parameter to be learned …

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis
R Deocadez, R Harrison, D Rodriguez – Proceedings of the 21st …, 2017 – dl.acm.org
… Parameters Values Inverse Document Frequency (IDF) Transform True Term Frequency (TF) Transform True Lower case transformation True Minimum term frequency 5 Stemmer Snowball stemmer Number of words to keep 200 …

Improved lexical similarities for hybrid clustering through the use of noun phrases extraction
B Thijs, W Glänzel, M Meyer – 2017 – lirias.kuleuven.be
… Stop words are removed through a custom built stop word list and remaining terms were stemmed by the Snowball Stemmer available in Lucene which is an extended version of the original Porter Stemmer (Porter, 1980). All terms that occur in only one document are removed …

A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques
M Bounabi, K El Moutaouakil, K Satori – Proceedings of the 2nd …, 2017 – dl.acm.org
… In this work, we compare three well-known stemming Techniques: Lovins stemmer, iterated Lovins and snowball Stemmer … The basis algorithms are Lovins stemmer [8], iterated Lovins [9], snowball Stemmer [10] and null stemmer …

CIKM AnalytiCup 2017–Lazada Product Title Quality Challenge: Constructing Features for a Diversified Ensemble of Classifiers
M Nicosia, A Moschitti – cikm2017.org
… Word features. We extract word counts from the lowercased titles. More specifically, we compute the TFIDF of unigrams considering: words, lemmas and stems obtained with (i) the Porter Stemmer, (ii) the Snowball stemmer …

Generalizable Architecture for Robust Word Vectors Tested by Noisy Paraphrases
V Malykh – ceur-ws.org
… 6.3 Word2Vec baseline For the standard word2vec baseline here we’re taking the model trained on Reuters corpus [Lewis04] by the gensim software package7. For this solution we’d used the Snowball stemmer described in [Porter01] …

Transformed document modeling for efficiently searching
HG Nguyen, TTS Nguyen – Information Science and …, 2017 – ieeexplore.ieee.org
… The texts are to8enized and filtered using Apache Lucene Analyzer 5.3.1. Stopwords are removed and terms are stemmed using Stemmer [9]. For example, the word “computer” and “computing” now become “comput” if Snowball Stemmer [11] is used or “compute” if WordNet …

Opinion mining on large scale data using sentiment analysis and k-means clustering
S Riaz, M Fatima, M Kamran, MW Nisar – Cluster Computing, 2017 – Springer
… Case Converted Stop Word Filter Snowball Stemmer … Snowball stemmer Snowball stemmer [25] is applied to stem terms by the stem- mer algorithms, this step reduces words to their stem or root form as stemming simplifies the sentiment analysis process …

Citation network analysis for supporting continuous improvement in Higher Education
C Colicchia, A Creazza… – Studies in Higher …, 2017 – srhe.tandfonline.com

CLEF 2017 Microblog Cultural Contextualization Content Analysis Task Overview
L Ermakova, J Mothe, E SanJuan – ceur-ws.org
… run available for active participants. This reference system is based on the Terrier platform5. Wikipedia pages in English, French, Spanish, and Por- tuguese were stemmed by the SnowBall stemmer. The pages retrieved by the …

Morphological Preprocessing for Word Embeddings in Text Classification Tasks
V TUSHKANOV – 2017 – romip.ru
… its raw version. These preprocessing types included stemming with Snowball Stemmer [Porter 2001], lemmatization and lemmatization with POS-tagging using MyStem [Segalovich 2003] with disambiguation on. Models were …

Principal Component Analysis in Topic Modelling of Short Text Document Collections
H Dobrovolskyi, N Keberle – ceur-ws.org
… a threshold value Pmax where p(wi) = |W | ? j=1 p (wi,wj) (4) 1 For instance list of English stop words is available at Snowball stemmer site http://snowball.tartarus. org/algorithms/english/stop.txt 51 Page 5. One of the ways to …

Detecting Named Entities and Relations in German Clinical Reports
R Roller, N Rethmeier, P Thomas, M Hübner… – … Conference of the …, 2017 – Springer
… clinical data. POS tags are used for both the CRF and SVM. Additionally, we stem words for jSRE using the German Snowball stemmer in NLTK. CharNER and CNN do not require additional linguistic features as input. For both …

Personalized news conversations with the Softbank Pepper
J Gerbscheid, T Groot, J Wessels, R Wever… – staff.fnwi.uva.nl
… Searching articles according to a given query is done using the list of keywords generated by the keyword extractor as features to describe an article. First, every term in every keyword is reduced to its stemmed form using the Snowball stemmer [9]. According to Peng et al …

Choosing the most reasonable split of a compound word using Wikipedia
Y Le – 2017 – diva-portal.org
Page 1. IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2017 Choosing the most reasonable split of a compound word using Wikipedia YVONNE LE …

PKU_ICL at SemEval-2017 Task 10: Keyphrase Extraction with Model Ensemble and External Knowledge
L Wang, S Li – Proceedings of the 11th International Workshop on …, 2017 – aclweb.org
… 3.1 Experimental Setup Preprocessing We use nltk (Bird, 2006) to seg- ment each paragraph into a list of sentences, tok- enize every sentence and then get part-of-speech tag for every token. Snowball Stemmer is used for stemming …

Automatic Coding of PISA Short Text Responses Across Multiple Languages
F Zehner, C Sälzer, F Goldhammer – researchgate.net
Page 1. Automatic Coding of PISA Short Text Responses Across Multiple Languages Fabian Zehner, Christine Sälzer, Frank Goldhammer German Institute for International Educational Research (DIPF), Centre for Technology …

Eve: An Automated Question Answering System for Events Information
I Christanno, P Priscilla, JJ Maulana… – ComTech: Computer …, 2017 – journal.binus.ac.id
… Information Retrieval module selects appropriate documents by checking each keyword in the query and find whether the keywords match with the index in the database or not. Stemming is also done in this module using Snowball Stemmer provided by NLTK …

Comparative Evaluation of Algorithms for Sentiment Analysis over Social Networking Services
A Krouska, C Troussas, M Virvou – Journal of Universal Computer Science, 2017 – jucs.org
… repetitions. Therefore, each text is transformed in a word vector form using the TF-IDF weighting model and applying word to- kenization, the Snowball stemmer library and the Rainbow list for stop-words removal except emoticons …

Visual interaction with dimensionality reduction: A structured literature analysis
D Sacha, L Zhang, M Sedlmair, JA Lee… – IEEE transactions on …, 2017 – ieeexplore.ieee.org
… 3.3 Automated Keyword-Based Filtering We implemented a basic NLP pipeline to analyze the initial set of papers. The pipeline parses the full text of each paper, applying a tokenizer and a snowball stemmer implemented from StanfordNLP components1 …

SISA–Automatic Indexing System for Scientific Articles: Experiments with Location Heuristics Rules Versus TF-IDF Rules.
I Gil-Leiva – Knowledge Organization, 2017 – webs.um.es
… Stemmer: SISA uses the Snowball stemmer algorithm to control for the gender and number of words so that these can be considered as a single item (child and children in English, menina and menino in Portuguese or niña and niño in Spanish) …

Using machine learning for labour market intelligence
R Boselli, M Cesarini, F Mercorio… – … European Conference on …, 2017 – Springer
… according to the following steps: (i) html tag removal, (ii) html entities and symbol replacement, (iii) tokenization, (iv) lower case reduction, (v) stop words removal (using the stop-words list provided by the NLTK framework [5]), (vi) stemming (using the Snowball stemmer), (vii) n …

Accomplishment Classifier with Machine Learning
K Barkai – 2017 – kaliaruth.minerva.community
… 1 # import stemmer to reduce words to stem words only 2 from nltk.stem.snowball import SnowballStemmer 3 # import string to remove punctuation from strings 4 import string 5 … 14 15 #stem words 16 x = 0 17 stemmer = SnowballStemmer(“english”) 18 while x < len(to_stem) …

Package ‘tm’
I Feinerer, K Hornik, MI Feinerer – 2017 – 164.41.45.44
Page 1. Package ‘tm’ December 6, 2017 Title Text Mining Package Version 0.7-3 Date 2017-12-06 Depends R (>= 3.2.0), NLP (>= 0.1-6.2) Imports Rcpp, parallel, slam (>= 0.1-37), stats, tools, utils, graphics, xml2 LinkingTo BH …

Using Clustering for Categorization of Support Tickets
D Beneker, C Gips – 2017 – pdfs.semanticscholar.org
… 2 https://nlp.stanford.edu/software/tagger.shtml Page 8. Stemming We used the Snowball Stemmer3 for stemform reduction. For words whose word stem occurs more than once, only the original of the first occurrence was retained …

Step 5–Text Mining and Recommender Systems
M Swamynathan – Mastering Machine Learning with Python in Six Steps, 2017 – Springer
… Open image in new window. Figure 5-3. Most popular NLTK stemmers. from nltk import PorterStemmer, LancasterStemmer, SnowballStemmer. # Function to apply stemming to a list of words. def words_stemmer(words, type=”PorterStemmer”, lang=”english”, encoding=”utf8″) …

Teaching NLTK Norwegian
B Bjerke-Lindstrøm – 2017 – duo.uio.no
… Porter stemmer is set as ‘frozen’, this means that it’s not being further developed and used as a frozen standard for comparing new stemmers against. Even though Porter stemmer is set to frozen we can create modifications of it, like ‘Snowball stemmer’ …

Sentiment analysis of a German Twitter-Corpus
M Flender, C Gips – 2017 – pdfs.semanticscholar.org
… of stopwords. To evaluate the influence of stopword removal, this step also can be skipped. As a last pre- processing step the tweets are processed by the Snowball stemmer. Now the tokens are converted into n-grams. We use …

Successful Data Science Projects: Lessons Learned from Kaggle Competition
MZ Al-Taie, N Salim, AI Obasa – Kurdistan Journal of Applied …, 2017 – kjar.spu.edu.iq
… 3. Stemming: the author also used stemming before generating features like counting features and BOW/TF-IDF features. For this sake, he used Porter stemmer and Snowball stemmer from NLTK. 5.1.2. Feature Extraction/Selection …

SummTriver: A new trivergent model to evaluate summaries automatically without human references
LA Cabrera-Diego, JM Torres-Moreno – Data & Knowledge Engineering, 2017 – Elsevier
… It should be said that before obtaining the n-grams, SummTriver can preprocess the documents. In first place, we can lowercase the documents, delete numbers and stop-words. As well, it can stem the words from the documents using a Snowball stemmer. 5. Evaluation …

Tensor Factorization Model on Social Sentiment from Textual Reviews
M Preethi, KF Bharati – ijetcse.com
… III. Stemming: It is defined as a process to reduce the derived words to their original word stem. For example, “talked”, “talking”, “talks” as based on the root word “talk”. We have used Snowball stemmer to reduce the derived word to their origin …

Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers
L Weissweiler, A Fraser – International Conference of the German Society …, 2017 – Springer
… Of all the stemmers presented here the Snowball stemmer is the only one for which an official implementation is available … Snowball. In 1996, Martin Porter developed the Snowball stemmer for English (Porter 1980). It became by far the most widely used stemmer for English …

Topic modeling of public repositories at scale using names in source code
V Markovtsev, E Kant – arXiv preprint arXiv:1704.00135, 2017 – arxiv.org
… D. Stemming names It is common to stem names when creating a bag-of-words in NLP. Since we are working with natural language that is predominantly English we have applied the Snowball stemmer [37] from the Natural Language Toolkit (NLTK) [38] …

LITL at CLEF eHealth2017: automatic classification of death reports
LM Ho-Dac, C Fabre, A Birski, I Boudraa, A Bourriot… – 2017 – pdfs.semanticscholar.org
… 8 The other features made available in the dictionaries were not exploited by our system Page 9. lowercasing text fields, elision filtering, stopwords filtering (using the default Solr stopword list for French), light stemming using the French snowball stemmer. 4 Querying Solr …

Detecting policy preferences and dynamics in the un general debate with neural word embeddings
S Gurciullo, S Mikhaylov – arXiv preprint arXiv:1707.03490, 2017 – arxiv.org
… nor stopwords. The documents are then tokenized, and a snowball stemmer (Porter, 2001) is implemented in order to reduce similar words to their root form (eg the tokens ‘economy’ and ‘economic’ would reduce to ‘econ’). All …

Optimizing XCSR for text classification
MH Arif, J Li, M Iqbal, H Peng – Service-Oriented System …, 2017 – ieeexplore.ieee.org
… Emoticons in text are replaced with #emoPositive and #emoNegative hash tags. 4) Unigram tokens are extracted using white spaces (space, tab) as delimiters. 5) Stemming converts the word in its principle form. We used the snowball stemmer available in WEKA. 90 Page 6 …

Strategies towards digital and semi-automated curation in RegulonDB
F Rinaldi, O Lithgow, S Gama-Castro, H Solano… – Database, 2017 – academic.oup.com
Abstract. Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at wh.

Characterizing malicious Android apps by mining topic-specific data flow signatures
X Yang, D Lo, L Li, X Xia, TF Bissyandé… – Information and Software …, 2017 – Elsevier
… vocabulary of NLTK. 6 Subsequently, we use the Snowball stemmer [24] to transform the remaining terms to their root forms (eg, reading and reads are reduced to) to unify similar words into a common representation. Finally, we …

SentiCR: A customized sentiment analysis tool for code review interactions
T Ahmed, A Bosu, A Iqbal, S Rahimi – Proceedings of the 32nd IEEE …, 2017 – dl.acm.org
… modified text 20: end procedure 5) Word Stemming: We used NLTK word tokenizer to parse each text into a list of words. Next, we applied Snowball Stemmer [38] to convert each word to its stem. 6) Stop-Word Removal: Many …

Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns
P Hájek – Neural Computing and Applications, 2017 – Springer
… AAP. 0.0107. 0.96. … 0.0169. 0.0200. … Pos. … … … … … ZUMZ. 0.0116. 1.01. … 0.0073. 0.0177. … Pos. To identify a set of useful n-grams, we first removed stop words, performed stemming using the Snowball stemmer, and converted all word tokens to lower case letters …

Applications of twitter emotion detection for stock market prediction
CH Liu – 2017 – dspace.mit.edu
… Page 28. 3.4.2 Data Preparation All words in the NRC Lexicon and all unigrams and bigrams in all tweets were converted to lowercase and stemmed with NLTK’s Snowball Stemmer. This is to ensure that two English words with the same base word, but different tenses or forms …

What is PostgreSQL
O Bartunov – postgrespro.ru
… Convert colors • Convert URLs to canonical way http://a.in/a/./index.html ? http://a.in/a/index. html ? Built-in dictionary templates • ispell, myspell, hunspell • snowball stemmer • thesaurus • synonym • simple Page 18. Dictionaries ? Dictionary — is a program …

Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval
A Gordo, D Larlus – IEEE Conference on Computer Vision …, 2017 – openaccess.thecvf.com
… We stem the words using the Snowball stemmer from NLTK [9]. Our models are learned with a batch size of 64 triplets (sextuples depending on the setup) using the ADAM opti- mizer with an initial learning rate of 10?5, which is reduced to 10?6 after 8k iterations …

Bayesian Nonparametric Models on Big Data
F Ozcan – 2017 – search.proquest.com
Bayesian Nonparametric Models on Big Data. Abstract. This thesis focuses on the role investor type and sentiment play in financial markets, using data from social media. First paper investigates the effect of the interaction between …

Processing affect in social media: a comparison of methods to distinguish emotions in tweets
R Meo, E Sulis – ACM Transactions on Internet Technology (TOIT), 2017 – dl.acm.org
… grams of adjacent words). After this first step, we applied stemming (Snowball Stemmer) [Porter 2001] and stop words elimination so a consistent cleaning and reduction of the number of terms is achieved. This latter method …

Bilingual Sentiment Analysis of Spanglish Tweets
M Serrano – 2017 – search.proquest.com
… Figure 19. Spanglish cases uses English and Spanish Snowball stemmer…..37. Figure 20 … 50 corpora and lexical resources. In the preprocessing of the data we used the NLTK stop. words corpus for English along with the English Snowball Stemmer provided by NLTK …

Automatic Identification and Classification of Software Development Video Tutorial Fragments
L Ponzanelli, G Bavota, A Mocci… – IEEE Transactions …, 2017 – ieeexplore.ieee.org
Page 1. 0098-5589 (c) 2017 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See http://www.ieee.org/ publications_standards/publications/rights/index.html for more information. This …

Storage and Transformation for Data Analysis Using NoSQL
C Nilsson, J Bengtson – 2017 – diva-portal.org
Page 1. Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Linköping University | Department of Computer Science Master thesis, 30 ECTS | Information Technology 2017 | LIU-IDA/LITH-EX-A–17/049–SE Storage and Transformation …

Ltrciiith at ibereval 2017: Stance and gender detection in tweets on catalan independence
S Swami, A Khandelwal, M Shrivastava… – Proceedings of the … – pdfs.semanticscholar.org
… Therefore, ‘#’ is removed from the hashtags and all the words are extracted from the hashtag. And then each word is considered as a separate token. All the tokens in Spanish are then stemmed using Snowballstemmer imple- mented in NLTK. 3.2 Features …

A Systematic Literature Review of Sentiment Analysis Techniques
J Kaur, SS Sehra, SK Sehra – International Journal of …, 2017 – pdfs.semanticscholar.org
… Case Converter: It converts all the terms present in the text to lower case. • Stemmer: It stems the terms present in the text by the stemmer algorithms which are provided by the Snowball stemmer library. Porter snowball stemmer is used in our project …

Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite
M Sukhareva, F Fuscagni, J Daxenberger… – Proceedings of the …, 2017 – aclweb.org
… For this purpose, we used Snowball stemmer for German.4 Then, we split all the Hittite words into characters and word boundaries were marked with a special character … Lower thresholds 4http://snowballstem.org 5Full German word form is ¨Agypten. 99 Page 6 …

Statistical Stemmers: A Reproducibility Study
G Silvello, R Bucco, G Busato, G Fornari, A Langeli… – dei.unipd.it
… 1 http://snowballstem.org/ Page 2. We decided to reproduce these three papers with the aim of making the stemmers they propose readily available to the research community such that they can be easily included in baseline systems and longitudinal studies …

SIIM 2017 Scientific Session Posters & Demonstrations Choosing Wisely in Medical Imaging: Natural Language Processing (NLP) Proof of Concept in …
CR Dyck, J Luo, I Wong, B Forster – siim.org
… negative exam results. 2. Boundary detection: existing libraries in Natural Language Toolkit (NLTK) are used to split sentences and words into individual items; 3. Word normalization: SnowballStemmer from NLTK was used to break down individual words into their root form …

Mama Edha at SemEval-2017 Task 8: Stance Classification with CNN and Rules
MG Lozano, H Lilja, E Tjörnhammar… – Proceedings of the 11th …, 2017 – aclweb.org
… The measures we applied were, eg, splitting contrac- tions and stemming the words. For the stemming we employed the Python package “Snowballstem- mer”. Also, in an effort to avoid training the clas- sifiers on data that could be too context specific, eg, a link or a user name …

Utilizing typed dependency subtree patterns for answer sentence generation in question answering systems
R Perera, P Nand, A Naeem – Progress in Artificial Intelligence, 2017 – Springer
… sentence to assign the score. The four Meteor modules are exact matcher, stem matcher (uses the Snowball Stemmer [31]), synonym matcher (based on WordNet [23] synonyms), and the paraphrase matcher. This metric is more …

Detecting clinically related content in online patient posts
C VanDam, S Kanthawala, W Pratt, J Chai… – Journal of biomedical …, 2017 – Elsevier
… We applied the Snowball Stemmer [31], an improvement on the Porter Stemmer, to unigrams. We found that the Snowball Stemmer returned more human readable stems, which was important for understanding the terms that influenced performance …

Word2Vec inversion and traditional text classifiers for phenotyping lupus
CA Turner, AD Jacobs… – BMC medical …, 2017 – bmcmedinformdecismak …
… used for this step. Python’s String library was used to parse out punctuation. Stop words were removed using nltk. This was followed by stemming using nltk’s SnowballStemmer [17]. Concept unique identifiers. In order to extract …

” What else are you worried about?”–Integrating textual responses into quantitative social science research
JM Rohrer, M Brümmer, SC Schmukle, J Goebel… – PloS one, 2017 – journals.plos.org
… Finally, words were reduced to their word stem (eg “politischen,” political, and “politiker,” politicians, to “polit”; “kind,” child, and “kinder,” children, to “kind”) by applying the German Snowball stemmer list [32, 33] but then re-expanded (eg “polit” to “politik,” politics, “kind” to “kinder …

DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE.
Z HUSSAIN, S IQBAL, T SABA… – … of Theoretical & …, 2017 – search.ebscohost.com
… Porter suggested the use of dictionaries overcome the mentioned stemming issues. Porter’s stemmer is widely used for the stemming of English and other European languages. It can be accessed online from the link http://snowballstem.org/demo.html …

Fasttext and Gradient Boosted Trees at GermEval-2017 on Relevance Classification and Document-level Polarity
L Hövelmann, S Allee… – Shared Task on Aspect …, 2017 – inf.uni-hamburg.de
… The vector entries are the TF-IDF (term frequency-inverse document frequency) values of the terms (Spärck Jones, 1972). 1available from: http://snowballstem. org/, last access: 2017-07-19, license: 3-clause BSD 31 Page 36 …

Deep Stylometry and Lexical & Syntactic Features Based Author Attribution on PLoS Digital Repository
I Safder, M Shabbir – Digital Libraries: Data, Information, and Knowledge …, 2017 – Springer
… For function word removal we use Apache Open NLP-a machine learning based toolkit used for the pre-processing of the natural language text [21]. For stem- ming, Snowball stemmer is used-a common stemming algorithm for information 120 S.-U. Hassan et al. Page 129 …

Deep Stylometry and Lexical & Syntactic Features Based Author Attribution on PLoS Digital Repository
SU Hassan, M Imran, T Iftikhar, I Safder… – … Conference on Asian …, 2017 – Springer
… For function word removal we use Apache Open NLP – a machine learning based toolkit used for the pre-processing of the natural language text [21]. For stemming, Snowball stemmer is used – a common stemming algorithm for information retrieval pre-processing [22]. The Fig …

Neural Captioning for the ImageCLEF 2017 Medical Image Challenges
D Lyndon, A Kumar, J Kim – ceur-ws.org
… to lower case, removing all punctuation (some captions contained multiple sentences, however, after this step each caption became a sin- gle sentence), removing stopwords using the NLTK [1] English stopword list and finally applying stemming using NLTK’s Snowball stemmer …

Opinion mining using an LVQ neural network
M Stylianidis, E Galiotou, C Sgouropoulou… – Proceedings of the 21st …, 2017 – dl.acm.org
Page 1. Opinion mining using an LVQ neural network Matthaios Stylianidis Department of Informatics Athens University of Applied Sciences Aigaleo 12210 Greece cs131060@teiath.gr Eleni Galiotou Department of Informatics …

Precedent vs. Politics? Case Similarity Predicts Supreme Court Decisions Better Than Ideology
E Ash, D Chen, S Panicker, A Trivedi – 2017 – nber.org
… This was 4 Page 5. done using NLTK’s Stanford POS tagger. Additionally, to discard multiple occurrences of the same root word, we use NLTK’s Snowball Stemmer. 3.3 Model 1. LDA To choose the number of topics on which LDA should be trained, we used …

Automatic Extraction of Design Decision Relationships from a Task Management System
M Ruppel – wwwmatthes.in.tum.de
Page 1. FAKULTÄT FÜR INFORMATIK DER TECHNISCHEN UNIVERSITÄT MÜNCHEN Masterarbeit in Wirtschaftsinformatik Automatic Extraction of Design Decision Relationships from a Task Management System Matthias Ruppel Page 2. 2 FAKULTÄT FÜR INFORMATIK …

Appendices: Positioning Under Alternative Electoral Systems: Evidence From Japanese Candidate Election Manifestos
A Catalinac – static.cambridge.org
… Using this combined TDM, we created our own list. Our list contained many words whose English-language equivalents are part of the list of stop words contained in the Snowball stemmer, such as “after”, “again”, “further”, and “make” …

Multi-Lingual LSA with Serbian and Croatian: An Investigative Case Study
C Layfield, D Ivanovic, J Azzopardi – researchgate.net
… A small Serbian and Croatian stop word list was utilised and a Snowball stemmer5 applied to all the text; the last step entails 4 As a side effect, the XML turned out to be badly formed in places and needed to be fixed by hand. 5 http://snowballstem.org Page 5 …

Implementation of Feature Selection and balanced random forest for Sentimental Analysis of Text Databases
E Babita, EP Singh – ijiet.com
… For example, “talked”, “talking”, “talks” as based on the root word “talk”. We have used Snowball stemmer to reduce the derived word to their origin. 3. Applying the Correlation based feature selection algorithm on collected data. CFS is correlation based feature selection …

A simple approach to multilingual polarity classification in Twitter
ES Tellez, S Miranda-Jiménez, M Graff… – Pattern Recognition …, 2017 – Elsevier
… In our study, we use the Snowball Stemmer for Spanish and Italian, and the Porter Stemmer for English; all of them are implemented in NLTK package [9]. 2.2.3. Negation (neg) feature. Negation markers might change the polarity of the message …

A Cross-Modal Concept Detection and Caption Prediction Approach in ImageCLEFcaption Track of ImageCLEF 2017
MM Rahman, T Lagree, M Taylor – ceur-ws.org
… Stopwords are removed using NLTK’s “english” stopword list and subsequently terms are removed from the vocabulary that occur in fewer than 10 captions, and finally the remaining words are reduced to their stems using NLTK’s Snowball stemmer, which finally form the …

Surf: Summarizer of user reviews feedback
A Di Sorbo, S Panichella, CV Alexandru… – … Companion (ICSE-C …, 2017 – ieeexplore.ieee.org
… On top of these topic-related dictionaries we built an NLP classifier to automatically assign every sentence in a review to one or more topics. Each sentence is stemmed (ie, reduced to its root form) using the Snowball Stemmer Algorithm [16] …

Towards the Automatic Sentiment Analysis of German News and Forum Documents
A Lommatzsch, F Bütow, D Ploch… – … Conference on Innovations …, 2017 – Springer
… We choose the Multinomial Naive Bayes classifier and evaluate our model applying 10-fold cross-validation. In order to calculate the feature vectors for the sentences we use bigrams, single word tokens, a customized stop words list and the German snowball stemmer. Results …

Target oriented tweets monitoring system during natural disasters
SSM Win, TN Aung – … and Information Science (ICIS), 2017 IEEE …, 2017 – ieeexplore.ieee.org
… For the purpose of stemming, this system uses a popular snowball stemmer. C. Feature extraction Feature extraction is the transformation of arbitrary data such as images or text into numerical features usable for 144 Page 3. classification …

Semantic Enriched Short Text Clustering
M Kozlowski, H Rybinski – International Symposium on Methodologies for …, 2017 – Springer
… representations. The answers (usually short phrases or sentences) were processed by sentence segmentation, word tokenization, stop-words cleaning, spell-correction, then finally tokens were stemmed (Snowball Stemmer 2 ) …

Job description mining to understand undergraduate co-operative placements
S Chopra – 2017 – uwspace.uwaterloo.ca
Page 1. Job description mining to understand undergraduate co-operative placements by Shivangi Chopra A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science in Management Sciences …

Automated Classification of Multi-Labeled Patient Safety Reports: A Shift from Quantity to Quality Measure.
C Liang, Y Gong – Studies in health technology and informatics, 2017 – clianglab.com
… See Table 2. Since these binary classifiers are not originally designed to process text data, we prepared our corpus as follows. (1) Snowball stemmer was used to reduce inflected terms to theirs root form [16]. (2) Rainbow list was used to remove stop words [17] …

Strategies for gazetteer improvement and enrichment
SK Singh – 2017 – era.library.ualberta.ca
Page 1. Strategies for gazetteer improvement and enrichment by Sanket Kumar Singh A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computing Science University of Alberta © Sanket Kumar Singh, 2017 Page 2 …

Identifying Effective Signals to Predict Deleted and Suspended Accounts on Twitter Across Languages.
S Volkova, E Bell – ICWSM, 2017 – cs.jhu.edu
… 6karpathy.github.io/2015/05/21/rnn-effectiveness/ 7https://keras.io/getting-started/sequential- model-guide/ 8https://keras.io/activations/ 9https://keras.io/optimizers/#rmsprop 10https://pypi. python.org/pypi/pymorphy2 11https://pypi.python.org/pypi/snowballstemmer 12http://scikit …

Package ‘tm’
I Feinerer, K Hornik, MI Feinerer – Corpus, 2017 – gentoo.mirror.ac.za
Page 1. Package ‘tm’ December 6, 2017 Title Text Mining Package Version 0.7-3 Date 2017-12-06 Depends R (>= 3.2.0), NLP (>= 0.1-6.2) Imports Rcpp, parallel, slam (>= 0.1-37), stats, tools, utils, graphics, xml2 LinkingTo BH …

Document Analysis Based on Multi-view Intact Space Learning with Manifold Regularization
Z Zhan, Z Ma – International Conference on Intelligent Science and Big …, 2017 – Springer
… of each term. Also, Stopwords and SnowballStemmer from NLTK library are used to remove the stop words and stem the tokens. In addition, tokens contain non-latin letters are dropped in these preprocess. For Chinese items …

Predicting Bad Patents: Employing Machine Learning to Predict Post-Grant Review Outcomes for US Patents
D Winer – 2017 – pdfs.semanticscholar.org
… they likely reflect the same concept across different patents in the corpus. To resolve this issue, we used the Natural Language Toolkit (NLTK) Snowball Stemmer to use stem words as features (NLTK 2016). This approach essentially …

Text-based question routing for question answering communities via deep learning
A Azzam, N Tazi, A Hossny – Proceedings of the Symposium on Applied …, 2017 – dl.acm.org
… These actions include HTML tags re- moval, tokenization and stop words filtration. The output of these pre-processing actions is clean word vec- tors that are stemmed using stemming algorithms written by the Snowball language (https://github.com/snowballstem) …

DEXTER: A workbench for automatic term extraction with specialized corpora
C PERIÑAN-PASCUAL – Natural Language Engineering, 2017 – cambridge.org
… numbers from the token stream. Then, the tokens are processed by the Snowball stemmer, and unigrams, bigrams and trigrams are derived from the stems. The default frequency threshold is set to three. As usual, a threshold …

KC Kit, D Rossiter – cse.ust.hk
… For example, the word ‘class’ and ‘classes’ are stemmed into ‘class’ while ‘classic’ is stemmed as classic. There exists Snowball stemmer implementation [13] of the Porter stemming Page 24. RO2 FYT – Business Lead Qualification by Online Information Scraping – Methodology …

Email Classification Using Machine Learning Algorithms
A Radhakrishnan, V Vaidhehi – enggjournals.com
… mining task. All the data are lowercased and stemming is performed on the dataset. Snowball stemmer is used to do the so-called step. It is a small string processing language designed for use in information retrieval. All the …

Deep Survey on Sentiment Analysis and Opinion Mining on Social Networking Sites and E-Commerce Website
P Arya, A mit Bhagat, B MANIT – International Journal of Engineering …, 2017 – ijesc.org
… [34] 03 POS tagger Twitter POS tagger www.cs.cmu.edu/~ark/TweetNLP/ [35][36] 04 Snowball English stemmer https://pypi.python.org/pypi/snowballstemmer [37] 05 Stanford Log-linear Part-Of-Speech Tagger POS tagger http://nlp.stanford.edu/software/tagger.shtml [38] …

Tag Recommendation for Short Arabic Text by Using Latent Semantic Analysis of Wikipedia
YKA Samra, IM Alagha – 2017 – mobt3ath.com
Page 1. Tag Recommendation for Short Arabic Text by Using Latent Semantic Analysis of Wikipedia Yousef K. Abu Samra Supervised By: Dr. Iyad M. Alagha Assistant Professor of Computer Science …

Biasness Identification of Talk Show’s Host by Using Twitter Data
S Ijaz, MIU Lali, B Shahzad, A Imran, S Tiwana – researchgate.net
… The results completely depend on the corpus and the algorithm used for stemming. Weka has two built-in stemming algorithms (1) Snowball Stemmer also known as porter stemmer and the second one is Lovins Stemmer. After …

Transfer Learning for Multi-language Twitter Election Classification
XYRMCC Macdonald, I Ounis – electoralviolenceproject.com
… Over 20M English tweets are observed in the English Twitter corpus, while there are only over 5M Spanish tweets in the Spanish Twitter corpus. We apply stop-word removal and the Snowball stemmer for the corresponding language to all tweets …

A Benchmark Collection for Program Objectives Mapping to ABET Outcomes: Accreditation
A Osman, AA Yahya, MB Kamal – waset.org
… TABLE II. THE SPECIFICATION OF THE MAIN PROCESSING STEPS IMPLEMENTED ON THE COLLECTION Step Specification Tokenization Alphabetic Tokenize Lowercase Tokens True Stop words Handler Rainbow Stemmer Snowball Stemmer Min …

Transfer Learning for Multi-language Twitter Election Classification
X Yang, R McCreadie, C Macdonald… – Proceedings of the 2017 …, 2017 – dl.acm.org
… Over 20M English tweets are observed in the English Twitter corpus, while there are only over 5M Spanish tweets in the Spanish Twitter corpus. We apply stop-word removal and the Snowball stemmer for the corresponding language to all tweets …

CREATE: Clinical REcords Analysis Technology Ensemble
A Dekhtyara, S Durstb, V Kaganb, A Stevensb… – joshterrell.com
… 3 Page 4. the Stanford parser and Part of Speech Tagger [39], and the Snowball stemmer for English[7]. SentEmotion … Stopword removal using the suggested english stopword lists in NLTK [6] and Scikit-Learn [29] • Stemming using the Snowball Stemmer [7] …

Construction accident narrative classification: An evaluation of text mining techniques
YM Goh, CU Ubeynarayana – Accident Analysis & Prevention, 2017 – Elsevier
… Thus, word stemming was carried out using the Snowball Stemmer (Shibukawa 2013), to obtain word stems in which the different forms of the words is aggregated to a single type. As an example, Fig. 2 is the processed text of the narrative in Fig. 1. Fig …

Detection of Topics and Construction of Search Rules on Twitter
ED Martínez, JP Fonseca, VM González… – Colombian Conference …, 2017 – Springer
… In addition to maintaining the supersets, a grouping is also performed by reducing words to their minimum expression (stemming). To achieve this we used Snowball stemmer, which does this in an algorithmic way in Spanish [17] …

Improved query reformulation for concept location using CodeRank and document structures
MM Rahman, CK Roy – Proceedings of the 32nd IEEE/ACM International …, 2017 – dl.acm.org
… Although existing studies suggest contradic- tory [28, 45] or conflicting [24] evidences for stemming with the source code, we investigate the impact of stemming with RQ4 where Snowball stemmer [24, 37] is used for stemming …

A case study of Spanish text transformations for twitter sentiment analysis
ES Tellez, S Miranda-Jiménez, M Graff… – Expert Systems with …, 2017 – Elsevier
… collapsing derivationally related words. In our study, we use the Snowball Stemmer for the Spanish language implemented in NLTK package (Bird et al., 2009). 3.1.5. Lemmatization (lem). Lemmatization process is a complex …

Placing of photos on the internet: Critical analysis of biases on the depictions of France and Afghanistan on FLICKR
C Lambio, T Lakes – Geoforum, 2017 – Elsevier

Task Oriented Tools for Information Retrieval
P Yang – 2017 – search.proquest.com
… understand the corresponding components. For example, indexing option can be easily. changed from porter stemmer to snowball stemmer so that the impact of such change. can be monitored. 1.2 Unied Reproducibility Evaluation System …

Validating a sentiment dictionary for German political language
C RAUH – 2017 – researchgate.net
… 1 accordingly. 9 I have employed the list of 231 German stopwords used in the Snowball Stemmer project, which can be accessed at http://snowball.tartarus.org/ algorithms/german/stop.txt (last accessed: 25.07.2016). 10 The …

Time dependency in topic models of media attention
B Wüest – pdfs.semanticscholar.org
… For now, we exclude other lexical categories since we find that common nouns and verbs make up more than 95% of all anchors. i) The lemma, POS tag, stem by the English Snowball stemmer (Porter 1980, 2001) of the word …

Finding the Topics of Case Law: Latent Dirichlet Allocation on Supreme Court Decisions
Y Remmits – 2017 – theses.ubn.ru.nl
… An example of stemming would be reducing both ‘walks’ and ‘walked’ to their stem ‘walk’. However, this is a rather crude instrument at moments: both ‘mar- keting’ and ‘markets’ to ‘market’ by the well-known Snowball stemmer [18] …

Identifying reference spans: topic modeling and word embeddings help IR
L Moraes, S Baki, R Verma, D Lee – International Journal on Digital …, 2017 – Springer
… To remove the effect of using words in their different forms, we used stemming (st) to reduce words to their root form. For this purpose, we use the Snowball Stemmer, pro- vided by the NLTK package [7]. WordNet has been utilized to expand the semantics of the sentence …

Sentiment analysis as reputational risk indicator
CP González, HMM Rodrigues, R Rodr?guez-Oliveros – researchgate.net
… URL https://astro. uni-bonn.de/~kbasu/ObsCosmo/Slides2012/Lecture3_2012.pdf. MF Porter. Snowball, Sep 2015. URL http://snowballstem.org/. Christopher Potts. Sentiment analysis symposium, san francisco, november 8-9, 2011. In Sentiment Symposium Tutorial, 2011 …

Advanced Text Analytics and Machine Learning Approach for Document Classification
C Anne – 2017 – scholarworks.uno.edu
Page 1. University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses Spring 5-19-2017 Advanced Text Analytics and Machine Learning Approach for Document Classification …

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models
MC Hughes, L Weiner, G Hope, TH McCoy Jr… – arXiv preprint arXiv …, 2017 – arxiv.org
Page 1. Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models Michael C. Hughes?1, Leah Weiner2, Gabriel Hope3, Thomas H. McCoy, Jr.4, Roy H. Perlis4, Erik B. Sudderth3, and Finale Doshi-Velez1 …

Entity-centric Topic Extraction and Exploration: A Network-based Approach
A Spitz, M Gertz – dbs.ifi.uni-heidelberg.de
… We find that the 2 https://www.ambiverse.com/ 3 http://snowballstem.org/ Page 9. Table 1. Traditional topics as ranked lists of terms, extracted for the four top-ranked edges in the network generated from the subset of NewYork Times articles …

Frlink: Improving the recovery of missing issue-commit links by revisiting file relevance
Y Sun, Q Wang, Y Yang – Information and Software Technology, 2017 – Elsevier

What Else Are You Worried About?
JM Rohrer, M Brümmer, SC Schmukle, J Goebel… – … Textual Responses into …, 2017 – osf.io
… word stem (eg “politischen,” political, and “politiker,” politicians, to “polit”; “kind,” child, and “kinder,” children, to “kind”) by applying the German Snowball stemmer list [32, 33] but then re-expanded (eg “polit” to “politik,” politics, “kind” to “kinder,” children) by applying …

Cross-lingual RST discourse parsing
C Braud, M Coavoux, A Søgaard – arXiv preprint arXiv:1701.02946, 2017 – arxiv.org
… 16Having more than three tokens in the head set is rare. 17We found similar performance on the other test set. 18http://ufal.mff.cuni.cz/udpipe 19https://en.wiktionary.org/wiki/User: Matthias_Buchmeier 20https://pypi.python.org/pypi/ snowballstemmer Page 8 …

Identifying child abuse through text mining and machine learning
C Amrit, T Paauw, R Aly, M Lavric – Expert systems with applications, 2017 – Elsevier
… In order to group various forms of the same words together, all words were reduced to their stemmed form using the Dutch Snowball stemmer 4 . This stemming framework proposed by Porter (1980) is included in the Python NLTK package. 4.5. Tokenization …

Multiclass patent document classification
C Anne, A Mishra, MT Hoque, S Tu – Artificial Intelligence Research, 2017 – sciedu.ca
… TFTransform True TFTransform is set true for the calculation of word weight. IDFTransform True IDFTransform is set true for the calculation of word weight. Stemmer Snowball stemmer Snowball Algorithm is selected for this option …

Discovering Words Association to Enhance the Effectiveness of Amharic Information Retrieval System
MT ABEBE – 2017 – etd.aau.edu.et
… “search engines are the most visible information retrieval applications” and a classic stop words set such as the one adopted by the Snowball stemmer,1 the effect of stop-word removal would be: “search engine most visible information retrieval applications” …

Multilingual Keyphrase Extraction and Advanced Localisation Strategies
D Degl’Innocenti – 2017 – dspace-uniud.cineca.it
Page 1. Universita degli Studi di Udine Dipartimento di Scienze matematiche, informatiche e fisiche Dottorato di Ricerca in Informatica e Scienze Matematiche e Fisiche Ph.D. Thesis Multilingual Keyphrase Extraction and Advanced Localisation Strategies Candidate …

Using word embeddings to enforce document-level lexical consistency in machine translation
EM Garcia, C Creus, C España-Bonet… – The Prague Bulletin of …, 2017 – degruyter.com
… 2Recall that phrase-based decoders perform translations by, in particular, using arbitrary alignments from source words to target words. For the SSLC feature we consider only the one-to-one word alignments. 3http://snowballstem.org/ 88 …

Resolving Range Violations in DBpedia
P Lertvittayakumjorn, N Kertkeidkachorn… – Joint International …, 2017 – Springer
… o’\). We also examined the distribution of correct answers among the three portions of $$S_t$$ – $$S_{t,1}$$, \S_{t,2}\), and $$S_{t,3}$$. Please note that, in this (and the next) experiment, the parameter $$\tau$$ for selecting clue texts was set at 0.9 and we used Snowball stemmer 5 to …

Section-wise indexing and retrieval of research articles
A Shahid, MT Afzal – Cluster Computing, 2017 – Springer
Page 1. Cluster Comput DOI 10.1007/s10586-017-0914-4 Section-wise indexing and retrieval of research articles Abdul Shahid1 · Muhammad Tanvir Afzal2 Received: 20 March 2017 / Revised: 22 April 2017 / Accepted: 5 May …

Geo-Social Analytics Based on Spatio-Temporal Dynamics of Marijuana-Related Tweets
J Turner, M Kantardzic – … of the 2017 International Conference on …, 2017 – dl.acm.org
… more sparse. Post prediction, we examined the text of the marijuana- related tweets. In order to accomplish this, we removed stop- words and punctuation then normalized the text by using a Snowball stemmer. Table 7 shows …

Extracting Conceptual Relationships and Inducing Concept Lattices from Unstructured Text
VS Anoop, S Asharaf – Journal of Intelligent Systems – degruyter.com
AbstractConcept and relationship extraction from unstructured text data plays a key role in meaning aware computing paradigms, which make computers intelligent by helping them learn, interpret, and synthesis information. These concepts and relationships leverage knowledge …

Probabilistic Topic Modelling for Controlled Snowball Sampling in Citation Network Collection
H Dobrovolskyi, N Keberle, O Todoriko – International Conference on …, 2017 – Springer
… 7. See NLTK Stemmers http://www.nltk.org/howto/stem.html. 8. See NLTK part-of-speech tagger http://www.nltk.org/book/ch05.html. 9. For instance list of English stop words is available at Snowball stemmer site http://snowball.tartarus.org/algorithms/english/stop.txt …

Modeling Inflectional Complexity in Natural Language Processing
G Nicolai – 2017 – era.library.ualberta.ca
Page 1. Modeling Inflectional Complexity in Natural Language Processing by Garrett Nicolai A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computing Science University of Alberta © Garrett Nicolai, 2017 Page 2 …

Which Leaflet is More Effective: A Reanalysis
J Peacock, H Sethu – 2017 – humaneleaguelabs.org
Page 1. Which Leaflet is More Effective: A Reanalysis Jacob Peacock and Harish Sethu Humane League Labs Report E001R02 August 11, 2017 Abstract This document presents a reanalysis of the data and the conclusions …

A novel automatic satire and irony detection using ensembled feature selection and data mining
K Ravi, V Ravi – Knowledge-Based Systems, 2017 – Elsevier

Implicit Versus Explicit Corporate Social Responsibility Disclosure: A Textual Analysis
K Hummel, S Mittelbach-Hoermanseder, C Cho… – 2017 – papers.ssrn.com
… 7 Stop words are highly frequent words that convey little meaning. 8 We experiment with different stemmers (Porter stemmer, Lancaster stemmer, Snowball stemmer and Wordnet stemmer) and finally apply the Wordnet stemmer, as it produces the best results …

The signals and noise: actionable information in improvised social media channels during a disaster
X He, D Lu, D Margolin, M Wang, SE Idrissi… – Proceedings of the 2017 …, 2017 – dl.acm.org
Page 1. The Signals and Noise: Actionable Information in Improvised Social Media Channels During a Disaster Xingsheng He1,3, Di Lu1, Drew Margolin2, Mengdi Wang1, Salma El Idrissi2, Yu-Ru Lin1 1School of Computing …

Digital narratives of place: learning about neighborhood sense of place and travel through online responses
A Sekar, RB Chen, A Cruzat… – … Record: Journal of the …, 2017 – trrjournalonline.trb.org
Page 1. 10 Transportation Research Record: Journal of the Transportation Research Board, No. 2666, 2017, pp. 10–18. http://dx.doi.org/10.3141/2666-02 As the market penetration of mobile information and communication …

Doing Computational Social Science with Python: An Introduction
D Trilling – 2017 – papers.ssrn.com
Page 1. Doing Computational Social Science with Python: An Introduction Damian Trilling Version 1.0 PDF created January 9, 2017 Page 2. Copyright c 2015, 2016, 2017 Damian Trilling This document can be obtained from http://papers.ssrn.com/abstract= 2737682 …

How do governments enact tax policy? Evidence from US states
E Ash – elliottash.com
Page 1. How do governments enact tax policy? Evidence from US states Elliott Ash? Abstract This paper contributes to recent work in political economy and public finance that focuses on how details of the tax code, rather than …

An Efficient Corpus-Based Stemmer
J Singh, V Gupta – Cognitive Computation, 2017 – Springer
… of nearly 50% in MAP. Rule-based Snowball stemmer, YASS, MORFESSOR, and HPS also performed almost equally and reported an improvement of nearly 45% against the unstemmed run. 4-GRAM and LINGUISTICA did …

Frontier Knowledge and Scientific Production: Evidence from the Collapse of International Science
A Iaria, C Schwarz, F Waldinger – The Quarterly Journal of …, 2017 – academic.oup.com
Page 1. FRONTIER KNOWLEDGE AND SCIENTIFIC PRODUCTION: EVIDENCE FROM THE COLLAPSE OF INTERNATIONAL SCIENCE ? Alessandro Iaria Carlo Schwarz Fabian Waldinger December 6, 2017 Abstract We …

Improving biomedical information retrieval with pseudo and explicit relevance feedback
SWR Niesink – 2017 – essay.utwente.nl
Page 1. MASTER’S THESIS IMPROVING BIOMEDICAL INFORMATION RETRIEVAL WITH PSEUDO AND EXPLICIT RELEVANCE FEEDBACK SWR Niesink Computer Science (CSc): Data Science and Smart Services (DS3) …

Human-Computer Collaboration for Faster Document Comparison
A Liberg – 2017 – brage.bibsys.no
Page 1. Human-Computer Collaboration for Faster Document Comparison Audun Liberg Master of Science in Computer Science Supervisor: Rune Sætre, IDI Department of Computer Science Submission date: July 2017 Norwegian University of Science and Technology …

Augmenting Translation Lexica by Learning Generalised Translation Patterns
KK Mahesh – 2017 – run.unl.pt
Page 1. Kavitha Karimbi Mahesh Master of Computer Applications Augmenting Translation Lexica by Learning Generalised Translation Patterns Dissertação para obtenção do Grau de Doutor em Informática Orientador: Doutor …

Islamophobia and Media Portrayals of Muslim Women: A Computational Text Analysis of US News Coverage
R Terman – International Studies Quarterly, 2017 – academic.oup.com
Abstract. This article examines portrayals of Muslim women in US news media. I test two hypotheses derived from theories of gendered orientalism. First, US new.

Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing
H Guan, X Shen, H Krim – … of the International Conference for High …, 2017 – dl.acm.org
Page 1. Egeria: A Framework for Automatic Synthesis of HPC Advising Tools through Multi-Layered Natural Language Processing Hui Guan North Carolina State University Raleigh, NC 27695 hguan2@ncsu.edu Xipeng Shen …

Social Network Censorship: Topics, Techniques, and Impacts
RS Tanash – 2017 – scholarship.rice.edu
Page 1. Page 2. ABSTRACT Social Network Censorship: Topics, Techniques, and Impacts by Rima S. Tanash Previous research in digital censorship focused by and large on studying censorship of applications and networks …

Describing Urgent Event Diffusion On Twitter Using Network Statistics
H Sun – 2017 – drum.lib.umd.edu
Page 1. ABSTRACT Title of Dissertation: DESCRIBING URGENT EVENT DIFFUSION ON TWITTER USING NETWORK STATISTICS Hechao Sun, Doctor of Philosophy, 2017 Dissertation directed by: Assistant Professor, Bill …

A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community
D Gefen, JE Endicott, JE Fresneda, J Miller… – Communications of the …, 2017 – aisel.aisnet.org
Page 1. Communications of the Association for Information Systems Volume 41 Article 21 11-2017 A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community …

Analysis and prediction of Dutch-English code-switching in Dutch social media messages
N Dongen – 2017 – pdfs.semanticscholar.org
Page 1. Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages MSc Thesis (Afstudeerscriptie) written by Nina Dongen (born 01-02-1981 in Amsterdam) under the supervision of Dr. Raquel …

Event Identification in Social Media using Classification-Clustering Framework
N Alsaedi – 2017 – orca.cf.ac.uk
Page 1. Event Identification in Social Media using Classification-Clustering Framework A thesis submitted in partial fulfilment of the requirement for the degree of Doctor of Philosophy Nasser Alsaedi Cardiff University School of Computer Science & Informatics Jan 2017 Page 2. i …

A systematic review of text stemming techniques
J Singh, V Gupta – Artificial Intelligence Review, 2017 – Springer
Stemming is a program that matches the morphological variants of the word to its root word. Stemming is extensively used as a pre-processing tool in the field of natural language processing, informati.

Kernel-based distribution features for statistical tests and Bayesian inference
W Jitkrittum – 2017 – discovery.ucl.ac.uk
Page 1. Kernel-Based Distribution Features for Statistical Tests and Bayesian Inference Wittawat Jitkrittum A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London …

Individuals recognition and simulation based on multiple data sources
RMP Barbosa – 2017 – recipp.ipp.pt
Page 1. Page 2. Dedicatory Este trabalho não poderia estar concluído sem antes expressar os contributos que não podem e nem devem deixar de ser realçados. Por essa razão, desejo expressar os meus sinceros agradecimentos …

Coreference resolution for biomedical pathway data
MJ Choi – 2017 – minerva-access.unimelb.edu.au
Page 1. Coreference Resolution for Biomedical Pathway Data Miji Jooyoung Choi School of Computing and Information Systems The University of Melbourne This thesis is submitted for the degree of Doctor of Philosophy September 2017 Page 2. Page 3. Declaration …

Towards a Better Human-Machine Collaboration in Statistical Translation: Example of Systematic Medical Reviews
J Ive – 2017 – theses.fr
Page 1. Towards a Better Human-Machine Collaboration in Statistical Translation : Example of Systematic Medical Reviews Thèse de doctorat de l’Université Paris-Saclay préparée à l’Université Paris-Sud École doctorale n°580 Sciences et technologies de l’information et de …

Predicting query quality for applications of text retrieval to software engineering tasks
C Mills, G Bavota, S Haiduc, R Oliveto… – ACM Transactions on …, 2017 – dl.acm.org
Page 1. 3 Predicting Query Quality for Applications of Text Retrieval to Software Engineering Tasks CHRIS MILLS, Florida State University GABRIELE BAVOTA, Universita della Svizzera italiana SONIA HAIDUC, Florida State …

(Visited 211 times, 1 visits today)