Skip to content

Meta-Guide.com

Menu
  • Home
  • About
  • Directory
  • Bibliography
  • Videography
  • Pages
  • Index
  • Random
Menu

What is a good way to strip a text of language-independent punctuation, like !, ?, and emoticons before trying for language detection?

Posted on 2012/09/252015/11/24 by mendicott

What is a good way to strip a text of language-independent punctuation, like !, ?, and emoticons before trying for language detection?

This is usually referred to as Text Normalization [1].  See Vineet Yadav’s answer to my Quora question: How would you make an API that converts any tweet into a proper English sentence?  In fact, I use Yahoo! Pipes Regex module for doing this.

[1] http://en.wikipedia.org/wiki/Text_normalization

Popular Content

New Content

Virtual Human Systems: A Generalised Model (2021)

 

Contents of this website may not be reproduced without prior written permission.

Copyright © 2011-2025 Marcus L Endicott

©2025 Meta-Guide.com | Design: Newspaperly WordPress Theme