Text-to-Image Systems (Draft)

text to graphics / text to image / text to picture / text to scene / text to video


Text-to-Image may also be known as “natural language animation”, “text-to-graphic” or “text-to-scene”. In “Animation 2000++” [PDF], Norman I. Badler states that “natural language animation systems will understand the natural language of motion concepts.” Examples of automatic text-to-graphics systems, or text-to-graphic conversion systems, include CarSim, WordsEye, Micons and SONAS. There is also something known as “text-to-scene conversion”, or text-to-scene conversion systems (TTSCS), as well as semantic systems that could pull graphics from the web in illustration of a narrative stories. For instance, image database APIs, such as Fotolia’s industry-leading API, could help developers quickly integrate a vast databases of images into a text-to-image utility.

There are three basic text-to-image techniques in use today. One is word clouds, known as tag clouds or sometimes called “woordles”. In tag clouds, texts are represented graphically as a kind of collage of frequency and relation. The second is known as text-to-scene conversion, or Text-to-scene conversion systems (TTSCS), often 3D. And the third is Word2Image Correlation. Tag cloud have also be called “woordles“. Word clouds generally use the so-called “bag of words model” for natural language processing. There is some backlash against word clouds though; The New York Times senior software architect, Jacob Harris, published a piece titled “Word clouds considered harmful” in 2011, likening word clouds to the fad of the “mullet” hairstyle.

There are other, more rudimentary techniques, such as potential image recognition from “word clouds” or “tag clouds“, visual text analysis or verbal visualization using the “WordBridge” [PDF] technique, of composite tag clouds in node-link diagrams for visualizing content and relations in text, for visualizing or identifying metaphors in tag clouds, or even creating “metaphor tag clouds”. Markus Strohmaier (@mstrohm) developed an intentional approach to visual text analysis using “intent tag clouds” [PDF] in 2009. In 2011, Rémi Flamary and colleagues published on clustering recurrent patterns in speech, titled “Spoken WordCloud” [PDF].

In 2006, Minhua Eunice (@ma_minhua) completed a PhD thesis all about “automatic conversion of natural language to 3D animation” [PDF]. Also in 2006, Richard Johansson wrote about “natural language processing methods for automatic illustration of text” [PDF]. In 2008, Kevin Glass wrote about “automating the creation of 3D animation from annotated fiction text” [PDF]. In 2010, Chris Czyzewicz (@thepolishpistol) published “a survey of text-to-scene applications”. By 2011, Xin Zeng and colleagues had produced, “Extraction of Visual Material and Spatial Information from Text Description for Scene Visualization“.


Articulate: Creating Meaningful Visualizations from Natural Language (2003) [PDF] .. by Yiwen Sun

See also: