In corpus linguistics, collocation refers to the mere co-occurrence of words. Filtering harmful sentences can be done based on three-word co-occurrence. Similarity can be measured based on co-occurrence probabilities for inducing semantic classes. Content may be created from summarized text and keywords extracted from documents using an algorithm based on term co-occurrence. Using a combination of ontology content, structure and co-occurrence information is more beneficial for the extension of large multi-domain ontologies, than using only content, only co-occurrence or only concept denotation information.

Automatically-generated summaries and representation of relationships between documents can be accomplished based on the co-occurrence of named entities and on clustering results. Choice of action verbs can be based only on the co-occurrence statistics encoded in a template-based generator for multimodal dialog systems. Statistical models can be based on co-occurrence measurements. Speech recognition errors may be detected based on semantic knowledge, constraint rules and statistical modeling, ie pointwise mutual information and co-occurrence analysis. Semantically different forms of multi-functionality may be represented by the co-occurrence of dialog acts in different types of dialog.

A popular method to estimate co-occurrence is to pose conjunctive queries including both terms to a web search engine, called “co-occurrence in snippets”. One system was designed using co-occurrence between the word in the news article and emotion words. For example, if people express their emotions in text, the single association language feature of a two terms combination, ie “myself” and “feeling”, has a high frequency of co-occurrence in sentences. Even emoticons can be automatically annotated according to their co-occurrence in a database. In robot navigation, a landmark component can ground novel noun phrases such as “the computers” in the perceptual frame of the robot by exploiting object co-occurrence statistics between unknown noun phrases and known perceptual features.

