  • neubig/kytea .. The Kyoto Text Analysis Toolkit for word segmentation and pronunciation estimation, etc.
  • htaunay/TClass .. A Framework for text classification, avaliation, segmentation, and model application, built with machine-learning algorithms based on vetorial representations of documents.
  • contours/textseg .. An experiment comparing the performance of text segmentation algorithms.
  • eirikrwu/tinybrain .. This repository contains the complete source code that we used to conduct experiments in the paper: Text Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation.
  • Qnan/Splitext .. Chen & Wu multi-plane text segmentation with CUDA
  • fnl/segtok .. scripts to pre-process plain-text: sentence segmentation, tokenization, and stemming
  • mikejs/usegment .. Python implementations of the Unicode text segmentation algorithms
