What tools can take a natural language query and convert it into a set of filters?
This is the proverbial “$64,000 Question”. I don’t know everything about this field, and will be happy to learn more myself. I have been investigating natural language tools for the past five years, and can recommend my recent videos on “Open Chatbot Standards for a Modular Chatbot Framework” [1].
There are tags, and there are tags. Not all tag sets are created equal. So, to a large degree, it depends on *how* your material is tagged. Beyond that, it depends on *what* you have tagged. Normally one would tag a natural language corpus, for instance a book or screenplay, for natural language interpretation. Tagged “data”, for instance below the sentence level, is another beast, and would need to be tackled statistically, or probabilistically.
This gets us into the nitty gritty. The tool that is required is usually referred to as a natural language *interpreter*. Let’s just take AIML and Alicebot as the prototypical example. AIML is a language, in other words a set of tags, specific to the various Alicebot interpreters. There are “Alicebot” interpreters available now for most common programming languages. AIML and the Alicebot interpreters are so-called “pattern matching” systems. In terms of “filtering”, this is basically how pattern matching interpreters work, by “filtering” on the tagged patterns, simply performing a kind of search.
Most chatbot hobbyists use pattern matching systems, of which there are many examples, often with their own “language” or tag sets. Many of the Loebner Prize Turing test crew also use pattern matching, or hybrid statistical pattern systems, and often are involved developing their own language (tag set) and interpreter. So-called “real” AI researchers tend to pooh-pooh even the Loebner Prize developers for using pattern matching techniques.
The primary alternatives to pattern matching systems and their “proprietary” tag sets are statistical interpreters, think n-gram and “latent semantic”. There are not a lot of good examples of turnkey statistical natural language interpreters in common use. Theoretically, statistical interpreters do not depend on tag sets. However, there are hybrid systems which process tags, or patterns, statistically. Patterns per se are just one kind of tag set, increasingly there are also “semantic” tag sets available.
Most grammar-based NLP tools seem to be more involved on the tagging side than the interpretation side. As far as I know, there is no good, off the shelf natural language interpreter available for semantic web tagging, such as RDF. There have been a number of attempts, and some claims; but, I’ve seen nothing concrete yet. This is common in the AI world, where the higher one rises, the more nebulous they become, until literally disappearing from reality altogether…. You may wish to look at my list of, currently 85x, “Theses in AI & NLP” (from the past 10 years) [2].