See my recent Quora answer on text-to-image systems:
· Is there any website that auto generates video from a novel based text?
No, that would not “be an effective test of human-level AI and natural language processing”….
However, such a reverse generation system could theoretically be made by crunching all YouTube (say frame by frame) coupled with their audio-to-text transcripts…. But, why would such a system be significant? Conceivably, the human mind converts language to image, and vice versa. This process is likely the origin of metaphor. And, metaphor is likely the basis of dream. So, such a reverse generation system could probably not only be used to generate metaphor, but also to translate or decode not just metaphors but dreams as well.
See also:
· quora.com/search?q=endicott+(dream+OR+dreams)
· quora.com/search?q=endicott+(metaphor+OR+metaphors)