Text-to-Image & Natural Language 2015


Notes:

  • Imagistic modeling
  • Scene generation
  • Scene retrieval
  • Synthetic image
  • Text-graphic generation
  • Text-to-animation
  • Text-to-3D
  • Text-to-image conversion
  • Text-To-Scene Conversion System (TTSCS)
  • Text-to-video
  • Visualizer

Resources:

See also:

Text-to-Image & Natural Language 2014


Deep visual-semantic alignments for generating image descriptions A Karpathy, L Fei-Fei – Proceedings of the IEEE Conference on …, 2015 – cv-foundation.org … edu Abstract We present a model that generates natural language de- scriptions of images and their regions. Our approach … mance. We quantify this comparison in our experiments. Grounding natural language in images. A number … Cited by 604 Related articles All 14 versions

Vqa: Visual question answering S Antol, A Agrawal, J Lu, M Mitchell… – Proceedings of the …, 2015 – cv-foundation.org … 2{memitc, larryz}@microsoft.com Abstract We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. … Cited by 206 Related articles All 18 versions

Ask your neurons: A neural-based approach to answering questions about images M Malinowski, M Rohrbach, M Fritz – Proceedings of the IEEE …, 2015 – cv-foundation.org … In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language input (image and question). … Grounding of natural language and visual concepts. … Cited by 91 Related articles All 13 versions

Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models BA Plummer, L Wang, CM Cervantes… – Proceedings of the …, 2015 – cv-foundation.org … We present experiments demonstrating the usefulness of our annotations for text-to-image reference resolution, or the task of localizing textual entity … which combines both of these goals, has received a lot of atten- tion in both computer vision and natural language process- ing … Cited by 42 Related articles All 13 versions

Associating neural word embeddings with deep image representations using fisher vectors B Klein, G Lev, G Sadeh, L Wolf – Proceedings of the IEEE …, 2015 – cv-foundation.org … This set is converted to a Fisher Vector based on one of the distributions: GMM, LMM, or HGLMM. Text to image matching is done using the Canonical Correlations Analysis algorithm [12]. This combination of methods proves to be extremely potent. 2. Previous work … Cited by 25 Related articles All 10 versions

Don’t just listen, use your imagination: Leveraging visual common sense for non-visual tasks X Lin, D Parikh – Proceedings of the IEEE Conference on …, 2015 – cv-foundation.org … visual common sense in our proposed FITB and VP tasks requires qualitatively a similar level of image understanding as in image-to-text and text-to-image tasks. … Natural language Q&A: Answering factual queries in natural language is a well studied problem in text retrieval. … Cited by 26 Related articles All 13 versions

Natural language object retrieval R Hu, H Xu, M Rohrbach, J Feng, K Saenko… – arXiv preprint arXiv: …, 2015 – arxiv.org … In the following we discuss these related areas. Natural language object retrieval. … [20] uses a structure prediction model based on Markov Random Field (MRF) to align text to image and reasons about object co- reference in text for 3D scene parsing. Image Captioning. … Cited by 27 Related articles All 4 versions

Aligning books and movies: Towards story-like visual explanations by watching movies and reading books Y Zhu, R Kiros, R Zemel, R Salakhutdinov… – Proceedings of the …, 2015 – cv-foundation.org … This source of knowledge, however, does not come with associated visual information that would enable us to ground it with natural language. … For text-to-image alignment, [17, 8] find correspon- dences between nouns and pronouns in a caption and visual objects using several … Cited by 45 Related articles All 9 versions

Ranking and retrieval of image sequences from multiple paragraph queries G Kim, S Moon, L Sigal – 2015 IEEE Conference on Computer …, 2015 – ieeexplore.ieee.org … of combinations of different model components in a unified way, including text segmentations, text descriptors, and text-to-image mapping methods. … Our work is unique in two aspects; first, our query structure is natural language paragraphs, and second, the retrieval targets are … Cited by 7 Related articles All 7 versions

Visual7w: Grounded question answering in images Y Zhu, O Groth, M Bernstein, L Fei-Fei – arXiv preprint arXiv:1511.03416, 2015 – arxiv.org … 6). Due to the large performance gap between human and machine, we envision our dataset and visually grounded QA tasks to contribute to a long-term joint effort from several communities such as vision, natural language processing and knowledge to close the gap together. … Cited by 42 Related articles All 5 versions

Grounding of textual phrases in images by reconstruction A Rohrbach, M Rohrbach, R Hu, T Darrell… – arXiv preprint arXiv: …, 2015 – arxiv.org … 1. Introduction Language grounding in visual data is an interesting prob- lem studied both in computer vision [16, 17, 19, 26] and natural language processing communities [20, 25]. … More specifically, we aim to localize arbitrary natural language noun phrases in images. … Cited by 21 Related articles All 3 versions

Deep compositional question answering with neural module networks J Andreas, M Rohrbach, T Darrell, D Klein – arXiv preprint arXiv: …, 2015 – arxiv.org … We an- swer natural language questions about images using collec- tions of jointly-trained neural “modules”, dynamically com- posed into deep networks based on linguistic structure. … w is a natural-language question • x is an image • y is an answer … Cited by 25 Related articles All 3 versions

Expressing an image stream with a sequence of natural sentences CC Park, G Kim – Advances in Neural Information Processing …, 2015 – papers.nips.cc … Our objective is, given a photo stream, to automatically produce a sequence of natural language sentences that best describe the essence of the input image … They propose a latent structural SVM framework to learn the semantic relevance relations from text to image sequences. … Cited by 5 Related articles All 5 versions

Alignment of eye movements and spoken language for semantic image understanding P Vaidyanathan, E Prud’hommeaux, CO Alm, JB Pelz… – IWCS 2015, 2015 – aclweb.org … This paper reports on a novel approach for semantically annotating important regions of an image with natural language descriptors. … What are you talking about? Text-to- image coreference. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3558–3565. … Cited by 3 Related articles All 8 versions

Listen, attend, and walk: Neural mapping of navigational instructions to action sequences H Mei, M Bansal, MR Walter – arXiv preprint arXiv:1506.04089, 2015 – arxiv.org … Our alignment-based encoder-decoder model with long short-term memory recurrent neural net- works (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the ob- servable world state. … Cited by 12 Related articles All 8 versions

Cross-document event coreference resolution based on cross-media features T Zhang, H Li, H Ji, SF Chang – … Methods in Natural Language …, 2015 – ee.columbia.edu … 2014. What are you talking about? text-to-image coreference. … In Proceedings of the 2012 Joint Conference on Empir- ical Methods in Natural Language Processing and Computational Natural Language Learning, pages 489–500. … Cited by 6 Related articles All 11 versions

Generating multi-sentence lingual descriptions of indoor scenes D Lin, C Kong, S Fidler, R Urtasun – arXiv preprint arXiv:1503.00064, 2015 – arxiv.org … Whereas a majority of work on image understanding focuses on class-based an- notation, we believe, however, that describing an image using natural language is still the best way to show one’s understanding. The task of … Cited by 2 Related articles All 4 versions

Movieqa: Understanding stories in movies through question-answering M Tapaswi, Y Zhu, R Stiefelhagen, A Torralba… – arXiv preprint arXiv: …, 2015 – arxiv.org … This is in large part due to efforts in large-scale data collection such as Microsoft’s COCO [19], Flickr30K [40] and Abstract Scenes [44] providing tens to hundreds of thousand images with natural language captions. … Memory Network for natural language answers. … Cited by 21 Related articles All 8 versions

Generating multi-sentence natural language descriptions of indoor scenes D Lin, S Fidler, C Kong… – British Machine Vision …, 2015 – pdfs.semanticscholar.org … 1 Generating Multi-sentence Natural Language … Whereas a majority of work on image understanding focuses on class-based annotation, we believe, however, that describing an image using natural language is still the best way to show one’s understanding. … Cited by 3 Related articles All 5 versions

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries Y Zhu, C Zhang, C Ré, L Fei-Fei – arXiv preprint, 2015 – pdfs.semanticscholar.org … It is followed by visual question answering [1, 14, 31, 32, 53], which aims at answering natural language questions based on image content. … From a user’s perspective, the input to this system is a natural language question along with a set of one or more images. … Cited by 5 Related articles All 2 versions

Sentence directed video object codetection H Yu, JM Siskind – arXiv preprint arXiv:1506.02059, 2015 – arxiv.org Page 1. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Sentence Directed Video Object Codetection Haonan Yu, Student Member, IEEE and Jeffrey Mark Siskind, Senior Member, IEEE Abstract … Cited by 2 Related articles All 3 versions

Learning Deep Structure-Preserving Image-Text Embeddings L Wang, Y Li, S Lazebnik – arXiv preprint arXiv:1511.06078, 2015 – arxiv.org … and the structure- preserving objective function, achieve significant improve- ments in accuracy for image-to-text and text-to-image re- trieval. … from predicting discrete, cat- egorical labels to generating rich descriptions of visual data, for example, in the form of natural language. … Cited by 12 Related articles All 5 versions

A Text to Image Story Teller Specially Challenged Children-Natural Language Processing Approach K Taj, P Dutta, SM Francis – Citeseer ABSTRACT Every human relishes happiness when he or she becomes a creator. This happiness does not have any metrics. A person who plants a tree, to a person who works on the most complicated flight engines feels happy on his creation. The work attempted in this

A Review On Automatic Caption Generation For News Images SG Dahule, CS Suratkar – database – ijrise.org … We are not using any dictionaries to provide text to image relation … of Computational Linguistics: Human Language Technologies [3] has described usually to represent images of objects in some natural language or in a human readable form image annotation system is utilize in … Related articles

Image with a Message: Towards detecting non-literal image usages by visual linking L Weiland, L Dietz, SP Ponzetto – emnlp2015.org … is of great importance, this completely disregards other commonly used, yet extremely challenging, dimensions of natural language like metaphorical … Within experiments (bidirectional image- sentence retrieval and text-to-image co reference), they showed the usefulness of links … Related articles All 7 versions

Neural Talk for Videos S Sachdeva, A Mittal – 2015 – gautam5.cse.iitk.ac.in … They then use an adaptive technique that translates between the visual and the natural language. 3 Methodology … 5.2 MSCOCO We trained our model on the MSCOCO train set and evaluated our model on the MSCOCO test set for comparing our text to image model. … Related articles

An Introduction to Machine Translation & Transliteration A Kunchukuttan – cse.iitb.ac.in … Page 35. Speech-to-Speech Translation Page 36. Image Text to Image Text Translation Translation on smaller devices … Other introductory material – Kevin Knight’s MT workbook www.isi.edu/natural-language/mt/wkbk.pdf – ICON 2013 tutorial on Statistical Machine Translation … All 4 versions

Deep Learning applied to Image and Text matching AI Baqapuri – arXiv preprint arXiv:1601.03478, 2015 – arxiv.org … 65 4 Page 5. Abstract The ability to describe images with natural language sentences is the hallmark for image and language understanding. … Introduction The ability to describe images with natural language sentences is the hallmark for image and language understanding. … Related articles All 4 versions

Topics, Trends, and Resources in Natural Language Processing (NLP) M Bansal – Citeseer Page 1. Topics, Trends, and Resources in Natural Language Processing (NLP) Mohit Bansal TTI-Chicago (CSC2523, ‘Visual Recognition with Text’, UToronto, Winter 2015 – 01/21/2015) (various slides adapted/borrowed from Dan Klein’s and Chris Manning’s course slides) … Related articles All 2 versions

Deep multimodal semantic embeddings for speech and images D Harwath, J Glass – 2015 IEEE Workshop on Automatic …, 2015 – ieeexplore.ieee.org … A related problem is that of natural language caption gen- eration … While our work in this paper does not aim to generate captions for images, it was originally in- spired by the text-to-image alignment models presented by Karpathy in [6, 8]. In [6], Karpathy uses a refined version of … Cited by 6 Related articles All 7 versions

News2Images: Automatically Summarizing News Articles into Image-Based Contents via Deep Learning JW Ha, D Kang, H Pyo, J Kim – 2015 – cs.cmu.edu … have showed amazingly successful reports in diverse domains including speech recognition [2], image and video classification [5], natural language processing [12 … into not sentences but images even if there exist many methods for summarization [9] or text-to- image retrieval [1 … Related articles All 4 versions

An Approach to Document Fingerprinting Y Kim, S Ross – International Conference on Asian Digital Libraries, 2015 – Springer … on which we might capitalise. Keywords: Text analysis · Natural language processing · Patterns · Readability 1 Introduction The usefulness and potential … improvement of all processes. 3 From Text to Image In the first instance, the … Related articles All 4 versions

Articulated Motion Learning via Visual and Lingual Signals Z Wu, M Bansal, MR Walter – arXiv preprint arXiv:1511.05526, 2015 – arxiv.org … Lingual signals, such as natural language descriptions and instructions, offer a comple- mentary means of conveying knowledge of such manipulation models and are suitable to a wide range of interactions (eg, remote manipulation). … Related articles All 3 versions

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks L Qin – cs224d.stanford.edu … Self-training pcfg grammars with latent annotations across languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2, pages 832–841. … From text to image to analysis: Visualization of chinese buddhist canon. … Related articles All 4 versions

The Painted Word-Writing The Image GPW Peppin – 2015 – ses.library.usyd.edu.au … communication, it can also be argued that ‘images’ work via a second system. This second system is one that is as fully expressive as natural language but also separate and structured independently of it.13 Some theorists consider visual and verbal meanings more dissimilar …

VQA: Visual Question Answering A Agrawal, J Lu, S Antol, M Mitchell, CL Zitnick… – arXiv preprint arXiv: …, 2015 – arxiv.org … In particular, research in image and video captioning that combines Com- puter Vision (CV), Natural Language Processing (NLP), and Knowledge Representation & Reasoning (KR) has dramati- cally increased in the past year [14], [7], [10], [36], [24], [22], [51]. … Cited by 1 Related articles

Saying What You’re Looking For: Linguistics Meets Video Search D Barrett, A Barbu, N Siddharth, J Siskind – 2015 – ieeexplore.ieee.org … This is in part because the most attractive interface for finding videos remains a natural-language query in the form of a sentence but determining if a sentence describes a video remains a difficult task. This task is difficult for … Related articles All 4 versions

Unsupervised semantic parsing of video collections O Sener, AR Zamir, S Savarese… – Proceedings of the IEEE …, 2015 – cv-foundation.org … Recent methods on natural language processing [40, 58] focus on semantic parsing of language recipes in order to extract actions and the objects in the form of predicates. Tenorth et al. [58] fur- ther process the predicates in order to form a complete logic plan. … Cited by 5 Related articles All 7 versions

Image hub explorer: Evaluating representations and metrics for content-based image retrieval and object recognition N Tomašev, D Mladeni? – Multimedia Tools and Applications, 2015 – Springer Page 1. Multimed Tools Appl DOI 10.1007/s11042-014-2254-1 Image hub explorer: evaluating representations and metrics for content-based image retrieval and object recognition Nenad Tomašev ·Dunja Mladenic Received … Cited by 4 Related articles All 9 versions

Towards Effective Image Annotation by Exploiting Multimodal Data X Xu – 2015 – catalog.lib.kyushu-u.ac.jp … pixels and textual words. Since the topic model is initially derived from the natural language processing community to cluster and to classify textual documents, it is a natural way to … as text-to-image search and image-to-text search to modeling images and associated text …

Predicting User-specific Temporal Retweet Count B Daróczy, R Pálovics, V Wieszner, R Farkas… – ntnu.no Page 1. Predicting User-specific Temporal Retweet Count Bálint Daróczy1 Róbert Pálovics1,2 Vilmos Wieszner3 Richárd Farkas3 András A. Benczúr1 1Institute for Computer Science and Control, Hungarian Academy of Sciences … Related articles

Levels of equivalence in the translation of two poems FND Carmo – 2015 – wiredspace.wits.ac.za … Jones claims that ‘the process of translating poetry operates at a multiple level of attention: to text, to image and to individual item, often simultaneously, though with different concentrations at different stages’ (1989:190). Host people read poetry for poetry’s sake. We read it for … Related articles All 2 versions

[BOOK] Tantric Visual Culture: A Cognitive Approach S Timalsina – 2015 – books.google.com … and the sacred. These images form a distinct language, following the similar parameters of our natural language. However, I do not consider that these images are simply parts of ritual that embody experience. On the contrary … Cited by 3 Related articles All 2 versions

Units of description: writing and reading the ‘archived’ photograph J Birkin – 2015 – eprints.soton.ac.uk Page 1. University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial … Related articles All 2 versions

[BOOK] Georgia O’Keeffe NJ Scott – 2015 – books.google.com Page 1. 5%eefe Nancy J. Scott Critical lives Page 2. Georgia O’Keeffe Page 3. Titles in the series Critical Lives present the work of leading cultural figures of the modern period. Each book explores the life of the artist, writer, philosopher … Related articles