Can collection of content and data be curated (altered in an comprehendable way) automatically by programming?

Can collection of content and data be curated (altered in an comprehendable way) automatically by programming?

Yes, I’ve been working on this for a good while. Text normalization is basically the massaging of data in various ways. I have found rule-based regex engines most practical for this. I have been trying to convert fragmentary tweets into proper sentences, in order to input into a dialog system knowledgebase for question answering, and thereby create a dynamic knowledgebase crowd-sourced from tweets. In terms of Twitter bots, Twitter is very sensitive about people ripping off other people’s tweets; therefore, it’s necessary to alter the content in this way to avoid “copyright” issues.