Gutenberg Corpus & Natural Language Processing 2015


Notes:

“Translationese” ?is the awkwardness or ungrammaticality of translation, due to overly literal translation of idioms or syntax.

Resources:

  • cwb.sourceforge.net .. open-source tools for managing and querying large text corpora
  • gitenberg.org .. open source community curating ebooks with detailed metadata in a variety of formats
  • gutentag.sdsu.edu .. nlp-driven tool for digital humanities research in the project gutenberg corpus
  • liwc.wpengine.com .. linguistic inquiry and word count
  • receptiviti.ai .. enabling ai platforms with emotional intelligence

Wikipedia:

See also:

Corpus Workbench


GutenTag: an NLP-driven tool for digital humanities research in the Project Gutenberg corpus J Brooke, A Hammond, G Hirst – … of the NAACL ’15 Workshop on …, 2015 – aclweb.org … 4 Subcorpus Filter Taken as a whole, the Gutenberg corpus is gener- ally too diverse to be of use to researchers in partic- ular fields. … In Pro- ceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP’13). Julian Brooke and Graeme Hirst. 2014. … Cited by 6 Related articles All 23 versions

Natural Language Processing using NLTK and WordNet A Farkiya, P Saini, S Sinha, S Desai – Citeseer … The remaining modules are task modules, each devoted to an individual natural language processing task. … Gutenberg Corpus: The Gutenberg Corpus is a collection of 14 texts chosen from Project Gutenberg – the largest online collection of free e-books. …

Exposing digital content as linked data, and linking them using StoryBlink B De Meester, T De Nies, L De Vocht… – NLP&DBpedia ( …, 2015 – biblio.ugent.be … Second, we review the used and relevant tech- nologies for creating our proof-of-concept (Subsection 1.2). 1.1 Semantic Natural Language Processing … 1 http://www.gutenberg.org/ 2 eg, LIDER and FREME (http://www.lider-project.eu/ and http://www. … Cited by 1 Related articles All 2 versions

Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. TA Pirinen – SKY Journal of Linguistics, 2015 – linguistics.fi … 12 <http://gutenberg.org> 13 <http://fi.wikipedia.org> 14 <http://ipsc.jrc.ec … with previous research on speed of optimised finite-state automata in natural language processing by Silfverberg … In the Gutenberg corpus, we get, among some missing proper nouns, archaic and dialectal … Related articles All 2 versions

Evidence of syntactic working memory usage in MEG data M van Schijndel, B Murphy, W Schuler – Proceedings of CMCL, 2015 – ling.ohio-state.edu … Dur- ing the experiment, the audio track was recorded in parallel to enable subsequent synchronization be- tween the brain activity and audio-book content. 1http://www.gutenberg.org/ cache/epub/219/pg219.txt; http://www.gutenberg.org/ebooks/20270 80 Page 3. … Cited by 1 Related articles All 9 versions

Automatic detection of words associations in texts based on joint distribution of words occurrences D Santoni, E Pourabbas – Computational Intelligence, 2015 – Wiley Online Library … in evaluating association of words in terms of deviation from randomness without requiring corpora, which is an underexplored issue in natural language processing (NLP). … 2 http://www.sacred-texts. com/bib/dar/index.htm 3 http://www.gutenberg.org/ 4 http://www.gutenberg.org/ … Cited by 2 Related articles

A Corpus for Analyzing Text Reuse by People of Different Groups WA Cheema, F Najib, S Ahmed, SH Bukhari, A Sittar… – uni-weimar.de … Note that all the contributors were volunteers, and were not paid for the purpose of data collection. 5 http://www.wikipedia.org/ 6 http://www.gutenberg.org/ Page 5. 4 Peer Review … In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing. pp. … Related articles All 4 versions

“Pale as death” or “p\^ ale comme la mort”: Frozen similes used as literary clich\’es S Mpouli, JG Ganascia – arXiv preprint arXiv:1511.01756, 2015 – arxiv.org … 2 www.gutenberg.org 3 beq.ebooksgratuits.com Page 7. … JURAFSKY, D., AND MARTIN, JH, 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition and Computational Linguistics. New Jersey: Prentice-Hall. … Cited by 1 Related articles All 5 versions

An automatic corpus based method for a building Multiple Fuzzy Word Dataset D Chandran, KA Crockett, D Mclean… – Fuzzy Systems (FUZZ- …, 2015 – ieeexplore.ieee.org … It has been used extensively in a number of different Natural Language Processing projects [27] and as a result it has had its effectiveness in the field proven. … Sentence Pairing Algorithm 1) Let T = set of sentences { , … … . in the Gutenberg Corpus where Si … … . . … Cited by 1 Related articles All 4 versions

Librispeech: an ASR corpus based on public domain audio books V Panayotov, G Chen, D Povey… – 2015 IEEE International …, 2015 – ieeexplore.ieee.org … in per-speaker audio duration. 5 https:lllibrivox.org/api/info 6http://blog.archive.org/ 20 II/03/31/how-archive-org-items-are structuredl 7http://www.gutenberg.org/wiki/ Gutenberg:Offiine_Catalogs 5208 subset hours per-spk female … Cited by 33 Related articles All 9 versions

Distribution of English syllables in e-books of Project Gutenberg and the evolution of syllable number in two subcorpora S Guo, G Zhang, R Zhai, Z Song – Digital Scholarship in the …, 2015 – dsh.oxfordjournals.org Skip to main content. OUP user menu. … Related articles All 3 versions

Software Library for Authorship Identification C Hantova, M Nisheva, P Ein-Dor, I Ivanov… – … and Preservation of …, 2015 – ceeol.com … The development of Natural Language Processing (NPL) makes it possible to parse the grammatical structure of the sentences more precisely and leads … to identifica- tion of authorship work in the context of the well-known Federalist papers (http://www.gutenberg.org/files/1404 … Related articles All 2 versions

Large linguistic corpus reduction with scp algorithms N Barbot, O Boëffard, J Chevelu, A Delhay – Computational Linguistics, 2015 – MIT Press … annotation). Similarly, in the natural language processing field (NLP), the adaptation of a generic model to a specific domain often requires new annotated data that illustrate its specificities (as in Candito, Anguiano, and Seddah 2011). … Related articles All 13 versions

Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks J Murdock, C Allen, S DeDeo – arXiv preprint arXiv:1509.07175, 2015 – arxiv.org … 2Accessible at http://hathitrust.org/; http://archive.org/; http://gutenberg.org/. … Located titles refers to the number identified in the HathiTrust (http://hathitrust.org/), Internet Archive (http://archive.org/), and Project Gutenberg (http://gutenberg.org/). … Cited by 1 Related articles All 4 versions

Building Resources for Philippine Languages N Oco, LR Syliongka, T Allman, RE Roxas – ilo, 2015 – academia.edu … The resources collected and developed can serve as catalyst in the fields of natural language processing, digital signal processing and linguistics, among others. … Proceedings of Recent Advances in Natural Language Processing, pp. 551-556. … Cited by 1 Related articles

The fractal patterns of words in a text: A method for automatic keyword extraction E Najafi, AH Darooneh – PloS one, 2015 – journals.plos.org … Data Availability: The sample text is available from http://www.gutenberg.org/files/22764/ and the glossary of the text is available from http://literature.org/authors/darwin-charles/the-origin- of-species/glossary.html. Funding: The authors have no support or funding to report. … Cited by 2 Related articles All 9 versions

Log-log convexity of type-token growth in Zipf’s systems F Font-Clos, Á Corral – Physical review letters, 2015 – APS … for de- tailed mathematical derivations. [33] CD Manning and H. Schütze, Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, MA, 1999). [34] See http://www.gutenberg.org/. [35] A. Deluca and A. Corral … Cited by 3 Related articles All 11 versions

Comparing Writing Styles using Word Embedding and Dynamic Time Warping A Tushar, A Dahiya – arXiv preprint arXiv:1511.01666, 2015 – arxiv.org … This flow can be quantified and compared by analyzing the text using natural language processing techniques. … 1 Project Gutenberg https://www.gutenberg.org 2 Python adaptation here https://radimrehurek.com/gensim/ models/word2vec.html Page 3. … Related articles All 3 versions

A preliminary study on similarity-preserving digital book identifiers K Vladimir, M Silic, N Romic, G Delac… – LaTeCH 2015, 2015 – anthology.aclweb.org … cO2015 Association for Computational Linguistics and The Asian Federation of Natural Language Processing A preliminary study on similarity-preserving digital book identifiers Klemo Vladimir1, Marin Silic1, Nenad Romic2, Goran Delac1, and Sinisa Srbljic1 1University of … org. … Related articles All 8 versions

Free or Fixed Word Order: What Can Treebanks Reveal? V Kubon, M Lopatková – ceur-ws.org … This study constitutes a motivation for formal modeling of natural language processing methods. 1 Introduction … An introduction to the study of speech. Harcourt, Brace and Company, New York (1921) (http://www.gutenberg.org/files/12629/12629-h/ 12629-h.htm). … Cited by 1 Related articles All 2 versions

Toward an Automated Measure of Narrative Complexity S Harmon, A Jhala – Eleventh Artificial Intelligence and …, 2015 – pdfs.semanticscholar.org … Prior work in natural language processing has involved identification of several of these narrative elements. … Free ebooks – Project Gutenberg. Guten- berg.org. http://www.gutenberg.org/. He, H.; Barbosa, D.; and Kondrak, G. 2013. Identification of speakers in novels. … Related articles All 2 versions

A Semantic Web Approach For Linking Stories B De Meester, T De Nies, L De Vocht, R Verborgh… – ceur-ws.org … Natural Language Processing (NLP) is concerned with all interactions between computers and natural languages. … 3 http://uvdt.test.iminds.be/storyblinkdata/books 4 http://www.gutenberg. org/ Page 4. 4 Ben De Meester et al. StoryBlink! NotreDame De Paris … Related articles All 3 versions

Reconnecting Digital Publications to the Web using their Spatial Information B De Meester, T De Nies, R Verborgh… – Proceedings of the 24th …, 2015 – dl.acm.org … This first contribu- tion is handled by using Natural Language Processing tech- niques to connect unstructured legacy content from Project Gütenberg2 with links on … what the lo- cation gravity point is of the content the user is currently 1http://idpf.org/ 2http://www.gutenberg.org/ … Cited by 1 Related articles All 4 versions

Word-Order Analysis Based Upon Treebank Data V Kubo?, M Lopatková – Mexican International Conference on Artificial …, 2015 – Springer … Comput. Linguist. 19, 313–330 (1993). 8. Oepen, S., Netter, K., Klein, J.: TSNLP – Test suites for natural language processing. … Harcourt Brace and Company, New York (1921). http://www.gutenberg.org/files/12629/12629-h/12629-h.htm. 13. … Cited by 1 Related articles

Detecting Semantic Change A Escorcio – 2015 – pdfs.semanticscholar.org … Word Sense Disambiguation (WSD) is an area of natural language processing that is concerned with being able to evaluate which of a word’s meanings is intended in a particular utterance. Consider these two extracts from Vladimir Nabokov’s Lolita: …

Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs VA Lestari, R Manurung – LaTeCH 2015, 2015 – aclweb.org … cO2015 Association for Computational Linguistics and The Asian Federation of Natural Language Processing Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs Victoria Anugrah Lestari and Ruli Manurung Faculty of Computer Science … Related articles All 7 versions

Using HFST—Helsinki Finite-State Technology for Recognizing Semantic Frames K Lindén, S Hardwick, M Silfverberg… – International Workshop on …, 2015 – Springer … 2 Tokenization Using hfst-pmatch. Tokenization is a necessary first step in most text-based natural language processing tasks. … References. 1. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. … Related articles All 3 versions

Unsupervised Text Normalization Using Distributed Representations of Words and Phrases VKR Sridhar – Proceedings of NAACL-HLT, 2015 – aclweb.org … In Pro- ceedings of ACL, pages 286–293. R. Collobert and J. Weston. 2008. A unified archi- tecture for natural language processing: deep neural networks with multitask learning. In Proceedings of ICML. R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. … Cited by 3 Related articles All 7 versions

The development and psychometric properties of LIWC2015 JW Pennebaker, RL Boyd, K Jordan… – UT Faculty/Researcher …, 2015 – utexas-ir.tdl.org Page 1. The Development and Psychometric Properties of LIWC2015 James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn The University of Texas at Austin Correspondence should be sent to James W. Pennebaker … Cited by 664 Related articles All 11 versions

Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations YF Luo, A Rumshisky, M Gronas – LaTeCH 2015, 2015 – aclweb.org … cO2015 Association for Computational Linguistics and The Asian Federation of Natural Language Processing Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations Yen-Fu Luo†, Anna Rumshisky†, Mikhail Gronas … Related articles All 10 versions

Readers and Reading in the First World War S Towheed, F Benatti, EGC King – Yearbook of English Studies, 2015 – JSTOR … Hart, ‘The History and Philosophy of Project Gutenberg’, Project Gutenberg, 1992 15 <http://www.gutenberg.org/wiki/Gutenberg … The Stanford Natural Language Processing Group, Stanford Named Entity Recognizer (NER), 25 version 3.4 (Stanford University, 2006–14) <http:// … Related articles All 2 versions

Extracting Social Network from Literature to Predict Antagonist and Protagonist M Fernandez, M Peterson, B Ulmer – 2015 – nlp.stanford.edu … Free ebooks by Project Gutenberg Gutenberg. (nd). Retrieved November 17, 2015, from http://www.gutenberg.org/ Kunegis, J., Lommatzsch, A., & Bauckhage, C. (nd). The slashdot zoo. … 2014. The Stanford CoreNLP Natural Language Processing Toolkit. … Related articles All 3 versions

Leveraging Digital Data Streams: The Development and Validation of a Business Confidence Index G Piccoli, J Rodriguez… – System Sciences (HICSS), …, 2015 – ieeexplore.ieee.org Page 1. Leveraging Digital Data Streams: The Development and Validation of a Business Confidence Index Gabriele Piccoli Louisiana State University University of Pavia gpiccoli@lsu.edu Joaquin Rodriguez Engineering Ingegneria … Cited by 1 Related articles All 5 versions

Zonal Text Processing V Yatsko – Digital Scholarship in the Humanities, 2015 – dsh.oxfordjournals.org … equalizing is proposed. 1 Introduction. Nowadays natural language processing (NLP) is a vast interdisciplinary subject field that deals with the creation of software and hardware for speech and text processing. NLP hardware … Related articles All 2 versions

Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules MA Boukhaled, JG Ganascia – Natural Language Processing and …, 2015 – books.google.com Page 127. Mohamed Amine Boukhaled and Jean-Gabriel Ganascia Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules Abstract: Authorship attribution is the task of identifying the author of a given document. … Cited by 1 Related articles All 8 versions

The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations F Hill, A Bordes, S Chopra, J Weston – arXiv preprint arXiv:1511.02301, 2015 – arxiv.org … 1https://www.gutenberg.org/ 2The dataset can be downloaded from http://fb.ai/babi/. 2 Page 3. Under review as a conference paper at ICLR 2016 … Ask me anything: Dynamic memory net- works for natural language processing. http://arxiv.org/abs/1506.07285, 2015. 10 Page 11. … Cited by 30 Related articles All 7 versions

A Parallel Corpus of Translationese E Rabinovich, S Wintner… – arXiv preprint arXiv: …, 2015 – pdfs.semanticscholar.org … 8 http://www.gutenberg.org 9 http://farkastranslations.com/ 10 http://en.wikisource.org/ Page 6. … Nisioi, S.: Unsupervised classification of translated texts. In Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E., eds.: Natural Language Processing and Information … Related articles

Network analysis of named entity interactions in written texts DR Amancio – arXiv preprint arXiv:1509.05281, 2015 – arxiv.org … IV. RESULTS AND DISCUSSION In this section, the topological properties of NE net- works are investigated. In addition, we apply the proposed networked representation to tackle a natural language processing task related to anaphora (or co- reference) resolution. … Cited by 1 Related articles All 3 versions

An Efficient Conjunctive Keyword and Phase Search Scheme for Encrypted Cloud Storage Systems HT Poon, A Miri – 2015 IEEE 8th International Conference on …, 2015 – ieeexplore.ieee.org Page 1. An Efficient Conjunctive Keyword and Phase Search Scheme for Encrypted Cloud Storage Systems Hoi Ting Poon and Ali Miri Department of Computer Science Ryerson University Toronto, Ontario, Canada hoiting.poon@ryerson.ca, samiri@scs.ryerson.ca … Cited by 1 Related articles All 3 versions

Distributional Semantics and Authorship Differences M Gritta – 2015 – academia.edu … particular words? The word model used in this research is Distributional Semantics, which in natural language processing is based on the notion that linguistic items with … Available at: http://www.gutenberg.org/ebooks/3400 [Accessed: 2nd February 2015]. … Related articles

Strange bedfellows: Shifting paradigms in the corpus-based analyses of literary translations G Lynch – researchgate.net … Machine translation has been a mainstay of natural language processing research since Weaver’s pioneering paper in the 1940’s and attracts a … 20 https://developers.google.com/prediction/ 21 www.gutenberg.org 22 www.wikisource.org 23 See Altintas, Can, and Patton (2007 … Related articles

An Efficient Document Indexing-Based Similarity Search in Large Datasets TN Phan, M Jäger, S Nadschläger, J Küng… – … Conference on Future …, 2015 – Springer … Furthermore, k-shingles originating from natural language processing are commonly exploited to better represent documents than using terms because of their continuous order while two documents might have the same number of terms but they … http://www.gutenberg.org/ … Cited by 1 Related articles All 2 versions

Maximal repetitions in written texts: Finite energy hypothesis vs. strong Hilberg conjecture ? D?bowski – Entropy, 2015 – mdpi.com … Therefore, state-of-the-art models in natural language processing overestimate the actual amount of randomness in texts. … the experiment, we have downloaded 14 texts in English, 10 texts in German and 11 texts in French from the Project Gutenberg (http://www.gutenberg.org/). … Cited by 4 Related articles All 11 versions

The evolution of conceptual diversity in economics titles from 1890 to 2012 S Guo, G Zhang, Q Ju, Y Chen, Q Chen, L Li – Scientometrics, 2015 – Springer Cited by 1 Related articles All 3 versions

Here be dragons? The perils and promises of inter-resource lexical-semantic mapping L Borin, R Johansson, LN Piña – … for Natural Language Processing and …, 2015 – ep.liu.se … Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015. … 2See <http://www.gutenberg.org/ebooks/22> and Cassidy (2000). … Related articles All 11 versions

The Significance of Mythological Motifs in two Gaelic Fairy Tales of the Nineteenth Century TG Dolan – dolanm.com … www.gutenberg.org/files/11027/11027-h/11027-h.htm#water Abbreviations RDF Resource Description Framework LOD Linked Open Data URI Uniform Resource Identifier PG Project Gutenberg AJ Apache Jena GFS Grimm’s Fairy Stories NLP Natural Language Processing … Related articles

Towards a better understanding of Burrows’s Delta in literary authorship attribution S Evert, T Proisl, F Jannidis, S Pielström… – … Linguistics for Literature, 2015 – aclweb.org … Authorship attribution has applications eg in literary studies, history, and forensics, and uses methods from Natural Language Processing, Text Mining, and Corpus Stylistics. … 2 The collection of French novels con- 2www. gutenberg. org 81 Page 96. … Cited by 3 Related articles All 13 versions

Finding an appropriate lexical diversity measurement for a small-sized corpus and its application to a comparative study of L2 learners’ writings W Choi, HY Jeong – Multimedia Tools and Applications, 2015 – Springer … Text Corpus from Project Gutenberg available on http://www.gutenberg.org, (2011) 11 … At EONOE, he had worked as a research scientist in natural language processing, language resources management and he had developed the Text-Entry Method for mobile and smart phones … Related articles

Automated Analysis of Narrative Text using Network Analysis in Large Corpora S Sudhahar – 2015 – saatviga.com … Our study is based on the automated analysis of news articles based on the state of the art Natural Language Processing and Artificial Intelligence techniques, to ex- tract information about the key events and relations in the media narrative. The … Cited by 1 Related articles

The detection and analysis of bi-polar phrases and polarity conflicts M Klenner, S Tron, M Amsler… – … Language Processing …, 2015 – books.google.com … noun. For instance, here is a couple of noun phrases taken from the (English) Gutenberg corpus where a F_NEG adjective is combined with a F_POS noun. Tab. 3: Examples of bi-polar phrases: F_NEG-F_POS combinations. … Cited by 3 Related articles

Using character valence in computer generated music to produce variation aligned to a storyline C Featherstone, E Van der Poel – … on South African Institute of Computer …, 2015 – dl.acm.org … Keywords Applications of Machine Learning, Music, Artificial Intelligence, Sentiment analysis, Opinion mining, Natural Language Processing … 4.2 Extracting the named entities Natural Language Processing (NLP) is a large field within com- puter science. … Related articles All 4 versions

Bilingual reading experiences: What they could be and how to design for them C Pillias, P Cubaud – Human-Computer Interaction, 2015 – Springer … Project (http://www.gutenberg.org). In this picture, we show extracts of the original English version (left) and of a French translation (right). Note the differences in the layout of paragraphs and construction of sentences. Natural Language Processing techniques can now … Cited by 4 Related articles All 9 versions

Embedding Topical Elements of Parallel Programming, Computer Graphics, and Artificial Intelligence across the Undergraduate CS Required Courses. J Wolfer – International Journal of Engineering Pedagogy, 2015 – search.ebscohost.com … Suitable topics include the robotics, haptic, medical im- aging, evolutionary computing, art, and natural language processing featured in this paper. … [10] Project Guttenberg Ebook, http://www.gutenberg.org [11] IBM, “Watson Project”, http://www.ibm.com/watson [12] Google … Related articles All 4 versions

Distinguishing Voices in The Waste Land using Computational Statistics J Brooke, A Hammond, G Hirst – LiLT (Linguistic Issues in …, 2015 – csli-lilt.stanford.edu Page 1. Linguistic Issues in Language Technology – LiLT Submitted, October 2015 Distinguishing Voices in The Waste Land using Computational Stylistics Julian Brooke Adam Hammond Graeme Hirst Published by CSLI Publications Page 2. Page 3. LiLT volume 12, issue 2 … Related articles All 8 versions

Can An Algorithm Be Disturbed?: Machine Learning, Intrinsic Criticism, and the Digital Humanities JE Dobson – College Literature, 2015 – muse.jhu.edu … of The Education of Henry Adams used for these examples was produced by Richard Fane and distributed by Project Gutenberg, and is available at: www.gutenberg.org/cache/epub … “Sentiment Analysis and Subjectivity.” In Handbook of Natural Language Processing, edited by … Related articles All 13 versions

New initiative: the naturalness of software P Devanbu – 2015 IEEE/ACM 37th IEEE International …, 2015 – ieeexplore.ieee.org … B. Porting and Translation Automated statistical translation has been one of the success stories of statistical natural language processing. … ja4F9L 6. http://promisedata.org 7. http://flossmole.org 8. https://github.com/bibanon/bibanon/wiki/Gittorrent 9. https://www.gutenberg.org/ 10 … Related articles All 5 versions

Origins and development of adjectival passives in Spanish C Marco, R Marín – New Perspectives on the Study of Ser and …, 2015 – books.google.com Page 246. Origins and development of adjectival passives in Spanish A corpus study* Cristina Marco1 and Rafael Marín2, 3 1 Gjøvik University College/2CNRS/3Université de Lille 3 To date, it has generally been assumed that most contemporary uses of Spanish estar ‘be. … Related articles All 7 versions

Identifying missing dictionary entries with frequency-conserving context models JR Williams, EM Clark, JP Bagrow, CM Danforth… – Physical Review E, 2015 – APS … to relative word frequencies. Later, though still early on in the history of modern computational linguistics and natural language processing, theory caught up with Shannon’s work. Becker wrote [3] the following. My guess is that … Cited by 2 Related articles All 21 versions

Gibberish speech as a tool for the study of affective expressiveness for robotic agents S Yilmazyildiz, W Verhelst, H Sahli – Multimedia Tools and Applications, 2015 – Springer … In the auditory channel, the majority of studies have focused mainly on utilizing Natural Language Processing technologies. However these technologies today are not robust and sophisticated enough for extensive use in a real-world environment. … Cited by 1 Related articles All 3 versions

PDLK: Plagiarism detection using linguistic knowledge A Abdi, N Idris, RM Alguliyev, RM Aliguliyev – Expert Systems with …, 2015 – Elsevier … It includes three major steps. First, in pre-processing step the basic tasks of natural language processing are done. … 1. Our method includes three main steps. In the first step, pre-processing the basic natural language processing tasks is done. … Cited by 4 Related articles All 7 versions

Abstract Representations of Plot Struture M Elsner – LiLT (Linguistic Issues in Language Technology), 2015 – csli-lilt.stanford.edu … All novels are downloaded from the Project Gutenberg Website (www.gutenberg. org) in raw text form; the Gutenberg header and footer are stripped, as are introductory and concluding material by editors, critics or publishers. … Cited by 3 Related articles

Phrase Detectives M Poesio, J Chamberlain… – Ide, N., and Pustejovsky, J. …, 2015 – dces.essex.ac.uk Page 1. Phrase Detectives Massimo Poesio, Jon Chamberlain and Udo Kruschwitz Abstract In this Chapter we discuss Phrase Detectives, a Game-With-A-Purpose (GWAP) for anaphoric annotation that has been one of the first … Cited by 1 Related articles All 5 versions

Syllabification and parameter optimisation in Zulu to English machine translation G Kotzé, F Wolff – 2015 – uir.unisa.ac.za … translation process. Moreover, in the field of natural language processing (NLP), MT can also be applied to solve other problems, such as cross-lingual information retrieval or for the extrinsic evaluation of more basic tasks. Training … Related articles All 4 versions

Size matters: choosing the most informative set of window lengths for mining patterns in event sequences J Lijffijt, P Papapetrou, K Puolamäki – Data Mining and Knowledge …, 2015 – Springer … of words in natural language corpora have become important concepts in research in linguistics (Gries 2008), natural language processing (Madsen et al. … Pride and Prejudice by Jane Austen, which is freely available through Project Gutenberg (http://www.gutenberg.org/ … Related articles All 10 versions

Using hashtags to capture fine emotion categories from tweets SM Mohammad, S Kiritchenko – Computational Intelligence, 2015 – Wiley Online Library Skip to Main Content. Wiley Online Library. Log in / Register. Log In E-Mail Address Password Forgotten Password? Remember Me. … Cited by 50 Related articles All 8 versions

Zipf’s law for word frequencies: Word forms versus lemmas in long texts Á Corral, G Boleda, R Ferrer-i-Cancho – PloS one, 2015 – journals.plos.org Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf’s law should hold in the most natural way, studying its validity for plain word forms … Cited by 7 Related articles All 18 versions

Markov State Space Aggregation via the Information Bottleneck Method BC Geiger – Schedae Informaticae, 2015 – ejournals.eu … identified correctly. 6.2. A Toy Example from Natural Language Processing In this … Communication” [17]. 3A copy of the text can be obtained from Project Gutenberg: http://www.gutenberg.org /ebooks/2853. Page 11. 55 Tab. 1. Partitions … Related articles All 11 versions

Discourse-centric learning analytics: mapping the terrain S Knight, K Littleton – Journal of Learning Analytics, 2015 – oro.open.ac.uk … from, other research on related topics. What is it that makes something DCLA, rather than natural-language processing, text based machine learning, or some other form of learning analytics? The interest in developing DCLA … Cited by 8 Related articles

Automatic annotation of Latin vowel length J Winge – 2015 – stp.lingfil.uu.se Page 1. Automatic annotation of Latin vowel length Johan Winge Uppsala University Department of Linguistics and Philology Språkteknologiprogrammet (Language Technology Programme) Bachelor’s Thesis in Language Technology June 4, 2015 Supervisor: Joakim Nivre … Related articles

Is there a formula for formulaic language? RS Forsyth, ? Grabowski – Poznan Studies in Contemporary …, 2015 – degruyter.com … In fact, many different types of sequences deserving the epithet “formulaic” are studied by researchers specializing in language acquisition, psycholinguistics, neurolinguistics, lexi- cography, computational phraseology, natural language processing, to name but a few fields … Cited by 1 Related articles All 6 versions

Vocabulary And Dementia: Six Novelists I Lancashire – Language Development: The lifespan perspective, 2015 – books.google.com … This essay also does not allow for analysis of how, as dementia increases, maximals include more and more associates. 3. Only a natural-language-processing program can fully extract lexical and grammatical features. Page 91. … Related articles All 2 versions

Automated classification and localization of daily deal content from the Web J Cuzzola, J Jovanovi?, E Bagheri, D Gaševi? – Applied Soft Computing, 2015 – Elsevier … in the last column of the table, we make use of lexical knowledge bases, such as WordNet 1 , and natural language processing techniques to … license or are in the public domain and are accessible through repositories such as Project Guttenberg (http://www.gutenberg.org/). … Related articles All 6 versions

A preadapted universal switch distribution for testing Hilberg’s conjecture ? D?bowski – IEEE Transactions on Information Theory, 2015 – ieeexplore.ieee.org Page 1. 5708 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 10, OCTOBER 2015 A Preadapted Universal Switch Distribution for Testing Hilberg’s Conjecture ?ukasz De?bowski Abstract—Hilberg’s conjecture … Cited by 5 Related articles All 7 versions

Big Data–Driven Natural Language–Processing Research and Applications V Gudivada, D Rao, V Raghavan – Big Data Analytics, 2015 – books.google.com Chapter 9 Big Data Driven Natural Language Processing Research and Applications Venkat N. Gudivada?, 1, Dhana Rao†, Vijay V. Raghavan‡ ? East Carolina University, Greenville, North Carolina, USA † Marshall University, Huntington, West Virginia, USA ‡ University of … Cited by 6 Related articles

On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges S Jänicke, G Franzini, M Cheema… – Proc. of EuroVis— …, 2015 – academia.edu Page 1. Eurographics Conference on Visualization (EuroVis) (2015) STAR – State of The Art Report R. Borgo, F. Ganovelli, and I. Viola (Editors) On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges … Cited by 9 Related articles All 4 versions

Visions and open challenges for a knowledge-based culturomics N Tahmasebi, L Borin, G Capannini… – International Journal on …, 2015 – Springer … We address all layers needed for knowledge-based culturomics, from natural language processing and relations to summaries and opinions. Keywords. … First layer The first layer of processing (not considering the digitization process) is natural language processing (NLP). … Cited by 5 Related articles All 13 versions

Labeling Educational Content with Academic Learning Standards MF Singer – SIAM … 1 N N ? i |Gi ? Si| |Gi| (5.9) where Gi and Si are the gold standard labels and system generated labels respectively for document i. 21http://www.wikibooks.org 22http://www.gutenberg.org 142 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. Page 8. … Related articles All 2 versions

The Haifa corpus of translationese E Rabinovich, S Wintner, OL Lewinsohn – arXiv preprint arXiv:1509.03611, 2015 – arxiv.org … Europarl is probably the most popular parallel corpus in natural language processing, and it was indeed used for many of the translationese tasks sur- veyed in Section 1. Unfortunately, it is a very problematic corpus. … 7 http://www.gutenberg.org 8 http://en.wikisource.org/ Page 8. … Cited by 2 Related articles All 3 versions

Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model SM Alzahrani, N Salim, V Palade – … of King Saud University-Computer and …, 2015 – Elsevier … On the other hand, corpus-based methods implement the relationship between the words as derived from large (and standard) text corpora, such as the Penn Treebank Corpus, Brown Corpus, Project Gutenberg corpus, Wikipedia corpus and others. … Related articles All 3 versions

Part-of-Speech Tagging of Source Code Identifiers using Programming Language Context Versus Natural Language Context RS AlSuhaibani – 2015 – rave.ohiolink.edu … the Natural Language Processing (NLP) community. Tagsets in Natural Language Processing are used in corpus. … various other corpuses such as Web Text Corpus, Reuters Corpus and Gutenberg Corpus. Furthermore, it gives the user the ability to load his or her own corpus. … Related articles All 3 versions

Estimating the probability of an authorship attribution J Savoy – Journal of the Association for Information Science …, 2015 – Wiley Online Library Page 1. Estimating the Probability of an Authorship Attribution Jacques Savoy Computer Science Department, University of Neuchatel, Rue Emile Argand 11, Neuchâtel 2000, Switzerland. E-mail: jacques.savoy@unine.ch In … Cited by 5 Related articles

Automatic language identification for metadata records: Measuring the effectiveness of various approaches RC Knudson – 2015 – digital.library.unt.edu Page 1. AUTOMATIC LANGUAGE IDENTIFICATION FOR METADATA RECORDS: MEASURING THE EFFECTIVENESS OF VARIOUS APPROACHES Ryan Charles Knudson BA, MA, MS Dissertation Prepared for the Degree of DOCTOR OF PHILOSOPHY … Related articles All 2 versions

Formalizing common sense reasoning for scalable inconsistency-robust information coordination using Direct Logic TM Reasoning and the Actor Model C Hewitt – Inconsistency Robustness, 2015 – hal.archives-ouvertes.fr Page 1. Formalizing common sense reasoning for scalable inconsistency-robust information coordination using Direct Logic TM Reasoning and the Actor Model Carl Hewitt To cite this version: Carl Hewitt. Formalizing common … Related articles All 4 versions

Meta-effectiveness Excerpts from Cognitive Productivity: Using Knowledge to Become Profoundly Effective LP Beaudoin – sfu.ca Page 1. Meta-effectiveness Excerpts from Cognitive Productivity: Using Knowledge to Become Profoundly Effective Luc P. Beaudoin, Ph.D. (Cognitive Science) Adjunct Professor of Education Adjunct Professor of Cognitive Science … Related articles

Building a Better World with Our Information: The Future of Personal Information Management, Part 3 W Jones – … Lectures on Information Concepts, Retrieval, and …, 2015 – morganclaypool.com … information management, human information behavior, digital libraries, archives and preservation, cultural informatics, in- formation retrieval evaluation, data fusion, relevance feedback, recommendation systems, question answering, natural language processing for retrieval … Cited by 2 Related articles All 4 versions

Approaches to Automatic Text Structuring N Erbs – 2015 – tuprints.ulb.tu-darmstadt.de … To foster future evaluation of natural language processing components for text struc- turing, we present two prototypes of text structuring systems, which … 1http://books.google.com/ 2https://www.gutenberg.org/ 3E.g. the collaboratively constructed encyclopedia Wikipedia.4 1 … Related articles

Learner Modelling for Individualised Reading in a Second Language M Walmsley – 2015 – researchcommons.waikato.ac.nz … of applications of the computer in language teaching and learning (Levi, 1997, p.1).” iCALL systems use linguistics and natural language processing techniques to … Natural language processing combines computer science with linguistics, and uses …

[BOOK] Aesthetics and design for game-based learning MD Dickey – 2015 – books.google.com Page 1. AESIHE|[‘S AND DESIGN FOR GAME-BASE}|EARNING MICHEED DICE) R Page 2. “This book is about a vital but neglected aspect of game-based learning: emotionally imbuing participants with motivation and meaning through aesthetic experiences. … Cited by 4 Related articles All 2 versions

[BOOK] Robots that talk and listen: technology and social impact J Markowitz – 2015 – books.google.com … For these digital natives, learning systems equipped with networking functionality and real- time, natural language processing can build a greater sense of independence and control whether the material covered is for school or personal development (eg, cre- ating art, cooking … Cited by 3 Related articles All 2 versions

Improving multilingual sentiment analysis using linguistic knowledge M Di Bari – 2015 – etheses.whiterose.ac.uk … 1 Page 22. 1.1 The problem majority of the studies in Natural Language Processing (NLP), Sentiment analysis has to face the possibility that multiple opinions on different entities occur in the same sen- tence, sometimes expressing opposing sentiments (Shastri et al., 2010). … Related articles All 3 versions

Composing Measures for Computing Text Similarity D Bär, T Zesch, I Gurevych – 2015 – tuprints.ulb.tu-darmstadt.de … Page 2. Abstract We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. … Related articles All 4 versions