Skip to content

Meta-Guide.com

Menu
  • Home
  • About
  • Directory
  • Videography
  • Pages
Menu

What is a good way to strip a text of language-independent punctuation, like !, ?, and emoticons before trying for language detection?

Posted on 2012/09/252015/11/24 by mendicott

What is a good way to strip a text of language-independent punctuation, like !, ?, and emoticons before trying for language detection?

This is usually referred to as Text Normalization [1].  See Vineet Yadav’s answer to my Quora question: How would you make an API that converts any tweet into a proper English sentence?  In fact, I use Yahoo! Pipes Regex module for doing this.

[1] http://en.wikipedia.org/wiki/Text_normalization

©2025 Meta-Guide.com | Design: Newspaperly WordPress Theme