Meta Guide Book Bots - Meta-Guide.com

Notes:

N-grams are sequences of n consecutive words; whereas, concgrams may include any co-occuring n words regardless of position. Concgrams allow for gaps in the n-grams and order variation too. Concgrams may also be known as gappy or skipping n-grams aka skip-grams (s-grams). Latent Semantic Indexing seems very similar to concgramming. The average number of words in an English sentence is around 16, and the average number of words in a tweet (140 characters) is 13. When parsing sentences from books, sentences may be numbered consecutively in order to calculate proximity. When multiple books are being used the sentence number may be pre-pended by ISBN in order to differentiate between volumes. This in effect assigns each sentence its own GUID (Globally Unique IDentifier).

Resources (Corpus):

Adobe FrameMaker (converts books/ePub into XML)
Altova XMLSpy (XML editor & XSLT processor)
ConcApp Concordancer
ConcGram List Builder
Google Books Ngram Viewer (About)
Microsoft Web N-gram Service (unigram, bigram, trigram, N-gram with N=4)
TopBraid Composer (converts XML into RDF)
WSConcGram (program for finding concgrams, essentially related pairs, triplets, quadruplets)

Resources (Sentence):

Resources (Chatlogs):

Wikipedia:

References:

Keyness in Texts (2010)
ConcGram 1.0: A Phraseological Search Engine (2009)
Corpus linguistics & Concgramming in Verbots and Pandorabots (2008)
From n-gram to skipgram to concgram (2006)
Statistical parsing of English sentences (2006)