How hard would it be to use automatic summarization to summarize one's tweets?

How hard would it be to use automatic summarization to summarize one’s tweets?

I have spent years working on this, and have written a good bit about it on Quora already, see below. It would be a LOT easier if people wrote correct English in tweets; however, tweets are by and large gibberish. And, with all the various kinds of re-tweeting, require a massive amount of de-duplication. De-duplication is required both before normalization, and after normalization, which quickly becomes a BIG data problem. (My definition of BIG data is, too big for you to do by yourself on your own hardware.) Normalization is in effect converting gibberish, including translating SMS-speak and Twitter-ese, into proper English.

Then you need to decide which summarization algorithms best serve your purpose. In my case, this quickly moved from automatic summarization to Natural language generation; in other words, you need to build something up in order to break it down. Given the fact that each and every thing can be said a thousand different ways on Twitter, not to mention in myriad different languages (see also Code-mixing), this Rubik’s Cube on steroids can become a nightmare scenario. IMHO, without a decent size team, and without reasonable funding, this is a moderately hard task (which is not to say impossible). 🙂