Natural Language Processing Toolkits


Notes:

Natural Language Processing toolkits like NLTK, Stanford NLP, and OpenNLP are well-known and widely used in the NLP community, and researchers and practitioners frequently cite and use these libraries in their work. Other toolkits, like ClearNLP, FreeLing, LingPipe, MALLET, and TreeTagger, while they may have a following in specific industries or communities, may not be as widely popular as the previously mentioned. Each toolkit is written in a different programming language and has its own unique set of features and strengths, such as ease of use, efficiency, customization, and speed.

  • ClearNLP is a natural language processing toolkit that is written in Java and designed to be easy to use.
  • FreeLing is a natural language processing toolkit that is written in Ruby and designed to be fast and efficient.
  • LingPipe is a natural language processing toolkit that is written in Java and designed to be easy to use and highly customizable.
  • MALLET (MAchine Learning for LanguagE Toolkit) is a machine learning toolkit that is written in Java and designed to be used for natural language processing tasks.
  • NLTK (Natural Language Toolkit) is a natural language processing toolkit that is written in Python and designed to be easy to use and highly modular.
  • OpenNLP is a natural language processing toolkit that is written in Java and designed to be easy to use and highly efficient.
  • Stanford NLP is a natural language processing toolkit that is written in Java and developed at Stanford University.
  • TreeTagger is a natural language processing toolkit that is written in Ruby and designed to be fast and efficient.

Resources:

Wikipedia:

See also:

Apache OpenNLP 2011 | Apache OpenNLP 2012 | Apache OpenNLP & Dialog Systems | Best Apache OpenNLP VideosBest NLTK Videos | Best Stanford NLP VideosLingPipe & Dialog Systems | MALLET & Dialog Systems | NLTK & ChatbotsNLTK & Dialog SystemsOpenCCG (OpenNLP CCG Library) | Stanford NLP & Dialog Systems | Stanford Parser & Dialog Systems | Stanford Tregex


clearnlp300814

  • clir/clearnlp .. Fast and robust NLP components implemented in Java.
  • baojie/clearnlp .. A fork of ClearNLP https://code.google.com/p/clearnlp/

freeling240814

  • insideout10/wordlift-stanbol .. InsideOut10 contributed libraries for Apache Stanbol. They include a Freebase Entity Search engine and a Freeling Language Identifier and PoS Tagging engine.
  • Ram-Z/hexdame .. Hexdame (or HexDame) is a strategy board game for two players invented by Christian Freeling in 1979. The game is a literal adaptation of the game International draughts to a hexagonal gameboard.

lingpipe240814

  • hvtuananh/lingpipe .. Clone version of LingPipe 4.1.0, with support for unsupervised training
  • iampratiksarkar/Twitminer .. Contains the codes for Twitminer 2014. It implements the naive bayes classifier of the Lingpipe toolkit to classify among sports and politics tweets based on hashtags, keywords and then n-gram models.
  • gwpantazes/author-analyzer .. Author Analyzer using multiple programming tools such as java, Lingpipe, Processing, D3, and more…
  • oaqa/lingpipe-models .. Models provided officially by LingPipe at http://alias-i.com/lingpipe/web/models.html.
  • karlmutch/WebAlgo-Java-Class .. Because most of the code I write is closed source and I wanted to give others a peek into my Java world. So, I have available, upon request, source code and documentation from a Web Algorithms class, part of a Java Certification track at UCB, done while I was independently learning Java. Granted it…

mallet240814

  • rforshaw/MalletEngine .. A powerful entity-component game-engine and editor, based on a modular design. Available on Linux, Windows, Mac, Android, and iOS.
  • JULIELab/jules-mallet-2 .. A lightly modified version of MALLET 2.0-rc-1. Some more optimization iterations are done for some algorithms.
  • mimno/anchor .. Mallet-compatible anchor-based topic model
  • Nineza/Mallet .. A Bukkit plugin, designed to track player interactions, breaks, chats, etc
  • shalomeir/mallet-2.0.7.1 .. mallet-2.0.7.1 is modified from mallet-2.0.7 for Multi-Label parallel processing DMR Topic Model.
  • mamihackl/NaiveBayes .. Statistical methods in NLP: Multi-variate Bernoulli NB, Mallet Naive Bayes, Multinominal NB model
  • cran/mallet .. A wrapper around the Java machine learning tool MALLET
  • ljo/exist-mallet .. Integrates the Mallet Machine Learning and Topic Modeling library into eXist-db
  • jattenberg/MALLET .. A clone of the UMASS machine learning library hosted in a 21st century version control
  • chonger/ScalaFrontend .. Frontends for Java NLP packages such as the berkeley parser, stanford parser, opennlp, and mallet in scala
  • j-kan/malletfn .. Clojure code for working with MALLET (http://mallet.cs.umass.edu)
  • vvcephei/mallet .. applying some architecture to the topic modelling package

nltk240814

  • ly08096/Spam-Filtering-Design .. Using 2002 spam email logs from Enron as training data, and 2014 healthcare tweets from Coursolve as testing data, do Spam Filtering using Scikit Learn and NLTK.
  • mpett/nltk_nlp .. A repo containing some examples and exercise solutions from O’Reilly’s book on NLP with Python
  • chena/text-proc-craig .. Text processing with NLTK and building vector space model for collection of documents.
  • paulboal/job-description-nltk .. A job description classification project based on the need to “find other people with jobs similar to some that I’ve manually tagged.”
  • sh4wn/expensum .. Web application to manage your expenses and savings. It uses NLTK to predict categories for imported expenses.
  • iwangu/reviewvibe .. Reviewvibe helps people to compare Amazon products. It grabs reviews from Amazon using kimonolabs and does sentiment analysis on them using NLTK.
  • tilius/author-profiler .. Author profiler for a corpus of blog entries, written in Python with nltk & libsvm
  • joy13/youtube-nlp .. Natural Language Processing: Sentiment analysis of YouTube comments using NLTK
  • Automotron/namebot .. A company/product name generating tool written in Python. Uses NLTK and diverse wordplay techniques for sophisticated word generation and ideation – also implements data metrics visualized via d3.js.
  • SAFeSEA/pyEssayAnalyser .. An essay Analyser & Summariser, using Flask for the API and NLTK for the language processing
  • r4j4531v4n/sherlock .. A text mining application written in python. Uses NLTK, Textblob packages
  • decause/presinaug .. Analyzing every US Presidential Inaugural Address with NLTK, pygal, word_cloud
  • sfu-natlang/xtag-nltk .. Python code to read, display and parse with the XTAG English (and other) grammars as part of the NLTK project.
  • SamuraiT/nltk3-alpha .. create repo for pip-install. you can find original project:http://www.nltk.org/nltk3-alpha/
  • Buntstift/pyuima .. wrapper for NLP author profiling using the nltk framework and pandas
  • sacry-/NLP .. Natural Language Processing with the nltk
  • telson/nltk .. http://www.nltk.org/book/ch01.html#sec-automatic-natural-language-understanding
  • Yuege/movie-review-analysis .. This is a tiny project which will show you how to use Naive Bayes and SVM methods to analyze the movie reviews. We will use NLTK and sklearn python packages for computing.
  • hlin117/character-feature .. This program finds a paragraph in text that tries to best represent a character in literature. I built this program around the story The Once and Future King using the python NLTK.
  • Herka/SentimentAnalysis .. Including different Parser to get things like movie reviews to train a NLTK classifier to detect positive/negative sentiment.
  • neelk07/CS398VL .. Visualizing Literature Projects – Python NLTK and D3.js
  • natxty/mm .. Python building-block scripts, leveraging NLTK, used for a client’s matching algorithm
  • darshan95/NLTK-Project .. GIven any query related to the database(INDIA-NEWZEALAND cricket series) the program answers
  • jaredks/graf-nltk .. A subclass of NLTK’s CorpusReader and instructions for modifying NLTK to include this development code. Copied and modified from https://github.com/cidles/graf-python

opennlp240814

  • AlexPoint/OpenNlp .. C# port of the Java OpenNLP tools retrieved from http://sharpnlp.codeplex.com/
  • JULIELab/jules-opennlp-postag-ae .. UIMA wrapper for the OpenNLP Part of Speech-Tagger. Uses the JULES type system and is thus compatible to other JULES components.
  • rbsdev/CRSE .. Context Relevant Search Engine using Lucene, Ontologies and OpenNLP
  • amb-enthusiast/PersonCoreferenceAnnotator .. A UIMA Annotator (including type system) which annotates sentences, tokens, Person named entities and coreference resolution “mentions”. Dev uses OpenNLP coreference and named entity recognition tools, within an apache UIMA Annotator analysis engine.
  • netconstructor/natural-language-framework .. Natural Language Framework is intended to be a collection of bindings for Ruby and provide access to general purpose NLP components. OpenNLP, GATE components, standalone tools (TreeTagger, Stanford Parser, etc.) will be accessible through NLFW.
  • parag/kwegg .. The futuristic news reader. Use openNLP to extract noun phrases from the every line and thence convert it to speech.
  • AhuraLab/NLPR .. plugin for OpenNLP to measure tet readability.
  • spatzle/jruby_sinatra_nlp .. A jruby sinatra based webservice app for opennlp. If you’re interested in the java servlet, it’s here https://gist.github.com/spatzle/1104702
  • t6d/opennlp .. A minimal Ruby wrapper around Apache’s OpenNLP library
  • chonger/ScalaFrontend .. Frontends for Java NLP packages such as the berkeley parser, stanford parser, opennlp, and mallet in scala
  • naush/signster .. NLP parser web service based on OpenNLP via clojure-opennlp.

stanfordnlp240814

  • guokr/stan-cn-nlp .. stan-cn-nlp: an API wrapper based on Stanford NLP packages for the convenience of Chinese users
  • cngo-github/tex-storm .. An implementation of a NLP system built around the Storm project based on what I’ve learned working with GATE, Stanford NLP, and UIMA and it’s associated projects.
  • adaam2/News-Before-It-Happens .. This is my Computer Science MSc final project. My aim is to try and correlate mentions of keywords in tweets within a certain area to try and find any news stories out before mainstream media channels report them. Built with node.js, express.js, sockets.io, angular.js, the Stanford NLP parser and of…
  • rletters/nlp-tool .. Simple Java CLI tool wrapping the Stanford NLP for use with RLetters
  • asheba/NLP .. A repository for the Stanford NLP class.
  • alee101/socher-wsd .. Word-sense disambiguation built on Stanford NLP’s sentiment treebank
  • mothsoft/stanford-nlp-war .. Provides a REST resource, JMS queue listener, and basic HTML form for making natural language processing parse requests using the Stanford CoreNLP libraries.
  • dfhoughton/StanfordCFG .. uses Stanford NLP’s NLP tools plus my Grammar library to facilitate constructing rule-based CFG parsers for natural language
  • avoistinov/Stanford-NLP .. This repository holds the code for quizzies and programming assignments related to the Stanford NLP (Natural Language Processing) course

treetagger240814

  • nyxtom/treetagger .. Node.js module for interfacing with the TreeTagger toolkit by Helmut Schmid.
  • korpling/treetagger-emf-api .. This project provides a EMF (see: http://www.eclipse.org/modeling/emf/) meta model and a java api for the treetagger data format produced and consumed by the linguistic tagging tool TreeTagger by Helmut Schmid (see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/).
  • netconstructor/natural-language-framework .. Natural Language Framework is intended to be a collection of bindings for Ruby and provide access to general purpose NLP components. OpenNLP, GATE components, standalone tools (TreeTagger, Stanford Parser, etc.) will be accessible through NLFW.