There has been a good bit of discussion on Quora about text-to-image systems already, but less about image-to-text systems per se.
In particular, see my Quora answers to:
- Is there any website that auto-generates video from a story-based text?
- How are metaphors handled in AI / NLP / ML?