Text-to-Image & Natural Language 2017


Notes:

  • Imagistic modeling
  • Scene generation
  • Scene retrieval
  • Synthetic image
  • Text-graphic generation
  • Text-to-animation
  • Text-to-3D
  • Text-to-image conversion
  • Text-To-Scene Conversion System (TTSCS)
  • Text-to-video
  • Visualizer

Resources:

See also:

SceneMaker | Text-to-Image Systems


Dualgan: Unsupervised dual learning for image-to-image translation
Z Yi, H Zhang, P Tan, M Gong – arXiv preprint, 2017 – openaccess.thecvf.com
… be available. Inspired by dual learning from natural language translation [23], we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In …

Person search with natural language description
S Li, T Xiao, H Li, B Zhou, D Yue, X Wang – Proc. CVPR, 2017 – openaccess.thecvf.com
… shot learning. Text-to-image retrieval can be conducted by calculating the distances in the embedding space. Frome et al. [6] associated … model jointly. 2. Benchmark for person search with natural language description Since there …

Visual dialog
A Das, S Kottur, K Gupta, A Singh… – Proceedings of the …, 2017 – openaccess.thecvf.com
… description [45, 53, 54], text-to-image coreference/ground- ing [8, 19, 24, 39, 41, 44], visual storytelling [2, 20], and of course, visual question answering (VQA) [2,4,9,14,16,32– 34,43,62]. However, all of these involve (at most) a single- shot natural language interaction – there is …

Towards diverse and natural image descriptions via a conditional gan
B Dai, D Lin, R Urtasun, S Fidler – arXiv preprint arXiv …, 2017 – openaccess.thecvf.com
Page 1. Towards Diverse and Natural Image Descriptions via a Conditional GAN Bo Dai1 Sanja Fidler23 Raquel Urtasun234 Dahua Lin1 1Department of Information Engineering, The Chinese University of Hong Kong 2University …

Phrase localization and visual relationship detection with comprehensive image-language cues
BA Plummer, A Mallya, CM Cervantes… – Proceedings of the …, 2017 – openaccess.thecvf.com
… To extract as complete a set of relationships as possible, we use natural language processing (NLP) tools to resolve pronoun references within a sentence: eg, by analyzing the 1928 Page 2. Method Single Phrase Cues Phrase-Pair Spatial Cues Inference …

Scribbler: Controlling deep image synthesis with sketch and color
P Sangkloy, J Lu, C Fang, F Yu… – IEEE Conference on …, 2017 – openaccess.thecvf.com
… Examples of control signals include 3d pose of objects [8], natural language [39], semantic at- tributes [50], semantic segmentation [5], and object key- 5401 Page 3. points and bounding box [38]. The artistic style transfer approach of Gatys et al …

Ask your neurons: A deep learning approach to visual question answering
M Malinowski, M Rohrbach, M Fritz – International Journal of Computer …, 2017 – Springer
… explicit assumptions about the compositionality of natural language sentences. Related to the Visual Turing Test, Malinowski and Fritz (2014c) have also combined a neural based representation with the compositionality of the language for the text-to-image retrieval task …

Vqa: Visual question answering
A Agrawal, J Lu, S Antol, M Mitchell, CL Zitnick… – International Journal of …, 2017 – Springer
… Abstract. We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring …

Maximum-likelihood augmented discrete generative adversarial networks
T Che, Y Li, R Zhang, RD Hjelm, W Li, Y Song… – arXiv preprint arXiv …, 2017 – arxiv.org
… Bengio 1 Abstract Despite the successes in capturing continuous distributions, the application of generative ad- versarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental …

The marginal value of adaptive gradient methods in machine learning
AC Wilson, R Roelofs, M Stern, N Srebro… – Advances in Neural …, 2017 – papers.nips.cc
… In Jian Su, Xavier Carreras, and Kevin Duh, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pages 1–11. The Association for Computational Linguistics, 2016 … Generative adversarial text to image synthesis …

Report on the sigir 2016 workshop on neural information retrieval (neu-ir)
N Craswell, WB Croft, J Guo, B Mitra, M de Rijke – ACM Sigir forum, 2017 – dl.acm.org
… state-of-the-art systems in areas of computer science, such as computer vision, speech processing, and natural language processing … range of tasks, including question/answering [34], proactive IR [21], knowledge-based IR [27], conversational models [29], text-to-image [6], and …

Breakingnews: Article annotation by image and text processing
A Ramisa, F Yan, F Moreno-Noguer… – IEEE transactions on …, 2017 – ieeexplore.ieee.org
… and language. Simultaneous progress in the fields of Computer Vi- sion (CV) and Natural Language Processing (NLP) has led to impressive results in learning both image-to-text and text-to-image connections. Tasks such as …

Learning particle physics by example: location-aware generative adversarial networks for physics synthesis
L de Oliveira, M Paganini, B Nachman – Computing and Software for Big …, 2017 – Springer
… Often positioned as a complement to discriminative models, generative models face a more difficult challenge than their discriminative counterparts, ie, reproducing rich, structured distributions such as natural language, audio, and images …

I2t2i: Learning text to image synthesis with textual data augmentation
H Dong, J Zhang, D McIlwraith, Y Guo – arXiv preprint arXiv:1703.06676, 2017 – arxiv.org
… that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen sig- nificant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to …

Adversarial ranking for language generation
K Lin, D Li, X He, M Sun, Z Zhang – Advances in Neural Information …, 2017 – papers.nips.cc
… classification restriction and conceiving a relative space with rich information for the discriminator in the adversarial learning framework, the proposed learning objective is favourable for synthesizing natural language sentences in … Generative adversarial text to image synthesis …

Semantic regularisation for recurrent image annotation
F Liu, T Xiang, TM Hospedales, W Yang… – arXiv …, 2017 – openaccess.thecvf.com
… and sea. Image captioning has a related aim, with the dif- ference of producing a complete natural language sentence description conditioned on the image content, rather than a simple unordered set of labels. For both problems …

School of Computer Science & Software Engineering
RN Date – Update, 2017 – signlearn.net
… Beneficial for the young, early acquisition of this natural language will assist to boost the communicating capacity of a young child. Introduction … Browse Category, Browse signs by category, User should be able to browser sign language dictionary by category (text to image) …

Adversarial neural machine translation
L Wu, Y Xia, L Zhao, F Tian, T Qin, J Lai… – arXiv preprint arXiv …, 2017 – arxiv.org
… [Hu et al., 2014] Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. Convolutional neural network architectures for matching natural language sentences. In NIPS, pages 2042–2050, 2014 … Genera- tive adversarial text to image synthesis. In ICML, 2016 …

Transnets: Learning to transform for recommendation
R Catherine, W Cohen – arXiv preprint arXiv:1704.02298, 2017 – arxiv.org
Page 1. TransNets: Learning to Transform for Recommendation Rose Catherine William Cohen School of Computer Science Carnegie Mellon University {rosecatherinek, wcohen}@cs.cmu.edu ABSTRACT Recently, deep learning …

Deep Learning for Image-to-Text Generation: A Technical Overview
X He, L Deng – IEEE Signal Processing Magazine, 2017 – ieeexplore.ieee.org
… A technical overview Generating a natural language description from an image is an emerging interdisciplinary problem at the intersection of computer vision, natural language processing, and artificial intelligence (AI) … Natural Language Processing, 2015, Beijing, China, pp …

Generative Neural Machine for Tree Structures
G Zhou, P Luo, R Cao, Y Xiao, F Lin, B Chen… – arXiv preprint arXiv …, 2017 – arxiv.org
… commonly used in the tasks of semantic analysis and understanding over the data of different modalities, such as natural language, 2d or … tree struc- tures might support many intelligent applications with tree-structured output, such as neural-based parser and text-to-image task …

Generative adversarial networks: introduction and outlook
K Wang, C Gou, Y Duan, Y Lin… – IEEE/CAA Journal of …, 2017 – ieeexplore.ieee.org
… For example, deep learning has achieved a breakthrough effect in image classification [19], [20], and significantly improved the accuracy of speech recognition [21]. It has also been successfully applied in natural language processing and understanding [22] …

Visual reference resolution using attention memory for visual dialog
PH Seo, A Lehrmann, B Han, L Sigal – Advances in neural …, 2017 – papers.nips.cc
… 1 Introduction In recent years, advances in the design and optimization of deep neural network architectures have led to tremendous progress across many areas of computer vision (CV) and natural language processing (NLP) …

Texturegan: Controlling deep image synthesis with texture patches
W Xian, P Sangkloy, J Lu, C Fang, F Yu… – arXiv preprint arXiv …, 2017 – arxiv.org
… Prac- tical image synthesis applications require perceptually controllable interfaces, ranging from high-level attributes, such as object classes [28], object poses [4], natural language descriptions [30], to fine- grained details, such as segmentation masks [16], sketches [12, 31 …

Attributes as semantic units between natural language and visual recognition
M Rohrbach – Visual Attributes, 2017 – Springer
… Visual Attributes pp 301-330 | Cite as. Attributes as Semantic Units Between Natural Language and Visual Recognition … From [64]. 12.3.1 Translating Image and Video Content to Natural Language Descriptions. Video Captioning …

Predicting Visual Features from Text for Image and Video Caption Retrieval
J Dong, X Li, CGM Snoek – arXiv preprint arXiv:1709.01362, 2017 – arxiv.org
… 32]. We also rely on a layered neural network architecture, but rather than predicting a class label for an image, we strive to predict a deep visual feature from a natural language description for the purpose of caption retrieval …

Image Pivoting for Learning Multilingual Multimodal Representations
S Gella, R Sennrich, F Keller, M Lapata – arXiv preprint arXiv:1707.07601, 2017 – arxiv.org
… System Text to Image Image to Text R@1 R@5 R@10 Mr R@1 R@5 R@10 Mr VSE (Kiros et al., 2015) 20.3 47.2 60.1 6 29.3 58.1 71.8 4 OE (Vendrov et al., 2016) 21.0 48.5 60.4 6 26.8 57.5 70.9 4 PIVOT … Multilingual multi-modal embeddings for natural language processing …

Using Co-Captured Face, Gaze, and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment
A Gangji, T Walden, P Vaidyanathan… – The AAAI-17 Workshop …, 2017 – par.nsf.gov
… What are you talking about? Text-to-image corefer- ence. In CVPR 2014 … IWCS 2015 76. Vaidyanathan, P.; Prudhommeaux, E.; Alm, C.; and Pelz, JB 2015b. Computational integration of human vision and natural language through bitext alignment …

Recent advances in convolutional neural networks
J Gu, Z Wang, J Kuen, L Ma, A Shahroudy, B Shuai… – Pattern Recognition, 2017 – Elsevier

Conditional generation of multi-modal data using constrained embedding space mapping
S Chaudhury, S Dasgupta, A Munawar… – arXiv preprint arXiv …, 2017 – arxiv.org
… The proposed learned mapping is akin to the example of humans visualizing ”mango” from its natural language or speech … Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Lo- geswaran, Bernt Schiele, and Honglak Lee, “Genera- tive adversarial text to image synthesis,” in …

Binary set embedding for cross-modal retrieval
M Yu, L Liu, L Shao – IEEE transactions on neural networks and …, 2017 – ieeexplore.ieee.org
… been well studied. The construction of local features for texts can be done by the word vector techniques [27]–[29] in natural language processing, which have been shown the superiority in machine translation. Once the learning …

Video Generation From Text
Y Li, MR Min, D Shen, D Carlson, L Carin – arXiv preprint arXiv …, 2017 – arxiv.org
… Text-to-video gener- ation requires a stronger conditional generator than what is necessary for text-to-image generation … The text description t is given as a sequence of words (natural language). The index i is only included when necessary for clarity …

Vision-Language Fusion for Object Recognition.
SR Shiang, S Rosenthal, A Gershman, JG Carbonell… – AAAI, 2017 – aaai.org
… In addition, robots interacting with hu- mans via natural language would also need such an ability to integrate what has been seen and what has been told … We note that speech recognition and natural language parsing are outside the scope of this paper …

pix2code: Generating Code from a Graphical User Interface Screenshot
T Beltramelli – arXiv preprint arXiv:1705.07962, 2017 – arxiv.org
… Furthermore, Ling et al. [12] recently demonstrated program synthesis from a mixed natural language and structured program specification as input … IEEE, 2015. [15] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis …

Unsupervised visual-linguistic reference resolution in instructional videos
DA Huang, JJ Lim, L Fei-Fei… – arXiv preprint arXiv …, 2017 – openaccess.thecvf.com
… 10, 44, 55]. Our approach of extracting graph associ- ating the entities with action outputs is related to works in robotics where the goal is to transform natural language in- structions for the robots to execute [26, 32, 56, 58]. It is …

Identity-aware textual-visual matching with latent co-attention
S Li, T Xiao, H Li, W Yang… – arXiv preprint arXiv …, 2017 – openaccess.thecvf.com
… The top-1 and top-10 accuracies are chosen to evaluate the performance of person search with natural language description following [15], which are the percentages of successful … We report the AP@50 for text-to-image retrieval and the top-1 accuracy for image-to-text retrieval …

Multimodal Dialogs (MMD): A large-scale dataset for studying multimodal domain-aware conversations
A Saha, M Khapra, K Sankaranarayanan – arXiv preprint arXiv:1704.00200, 2017 – arxiv.org
… 2, we use a standard recur- rent neural network based decoder with GRU cells. Such a decoder has been used successfully for var- ious natural language generation tasks including text conversation systems (Serban et al., 2016b) …

Generating descriptions with grounded and co-referenced people
A Rohrbach, M Rohrbach, S Tang… – arXiv preprint arXiv …, 2017 – openaccess.thecvf.com
… Intelligent Systems, Tübingen, Germany Abstract Learning how to generate descriptions of images or videos received major interest both in the Computer Vision and Natural Language Processing communities. While a few works …

Picture it in your mind: Generating high level visual representations from textual descriptions
F Carrara, A Esuli, T Fagni, F Falchi… – Information Retrieval …, 2017 – Springer
… The text-to- image branch is, in any case, a regressor … We believe MsCOCO to be more fit to the scenario we want to explore, since the captions associated to the images are expressed in natural language, thus semantically richer than a short list of keywords composing a query …

Few-Shot Adversarial Domain Adaptation
S Motiian, Q Jones, S Iranmanesh… – Advances in Neural …, 2017 – papers.nips.cc
… [5] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing, pages 120–128. Association for Computational Linguistics, 2006 …

Multimodal Machine Learning: A Survey and Taxonomy
T Baltrušaitis, C Ahuja, LP Morency – arXiv preprint arXiv:1705.09406, 2017 – arxiv.org
… 204] Image description ? [50], [121], [142] Encoder-decoder Image captioning ? [105], [134] Video description ? [213], [241] Text to image ? [132], [171 … translation has seen renewed interest due to combined efforts of the computer vision and natural language processing (NLP …

TEXT TO IMAGE GENERATIVE MODEL USING CONSTRAINED EMBEDDING SPACE MAPPING
S Chaudhury, S Dasgupta, A Munawar, MAS Khan… – researchgate.net
… TEXT TO IMAGE GENERATIVE MODEL USING CONSTRAINED EMBEDDING SPACE … ABSTRACT We present a conditional generative method that maps low- dimensional embeddings of image and natural language to a common latent space hence extracting semantic …

Spectral Graph-Based Method of Multimodal Word Embedding
K Fukui, T Oshikiri, H Shimodaira – … based Methods for Natural Language …, 2017 – aclweb.org
… Word embedding plays important roles in the field of Natural Language Processing (NLP) … Our ex- perimental results showed that MM-Eigenwords captures both semantic and text-to-image similari- ties, and we found that there is a trade-off between these two similarities …

Text-to-Image Generation Using Multi-Instance StackGan
A Fu, Y Hou – cs231n.stanford.edu
… networks enables capturing and generalizing the semantics of the input texts, which has improved the performance of text-to-image generation tasks … resolution images with photo-realistic details [1]. In this paper, we are interested in translating a natural language caption into an …

Deep learning for natural language processing: advantages and challenges
H Li – National Science Review, 2017 – academic.oup.com
… text to image), in which query and image are first transformed into vector representations with CNNs, the representations are matched with DNN and the relevance of the image to the query is calculated [3]. Deep learning is also employed in generation-based natural language …

Language-Based Image Editing with Recurrent Attentive Models
J Chen, Y Shen, J Gao, J Liu, X Liu – arXiv preprint arXiv:1711.06288, 2017 – arxiv.org
… Dataset The ReferIt dataset is composed of 19,894 photographs of real world scenes, along with 130,525 natural language descriptions on 96,654 … recurrent attentive models to dynamically decide, for each region of an image, whether to continue the text-to-image fusion process …

Text to Image Synthesis Using Stacked Generative Adversarial Networks
A Zaidi – cs231n.stanford.edu
Page 1. Text to Image Synthesis Using Stacked Generative Adversarial Networks Ali Zaidi Stanford University & Microsoft AIR alizaidi@microsoft.com Abstract Human beings are quickly able to conjure and imagine images related to natural language descriptions …

Deep Attribute-preserving Metric Learning for Natural Language Object Retrieval
J Li, Y Wei, X Liang, F Zhao, J Li, T Xu… – Proceedings of the 2017 …, 2017 – dl.acm.org
… text-to-image alignment and reasoned about object co-reference in text for 3D scene parsing. Rohrbach et al. [34] used a recurrent network to encode the phrase and then learned to attend to the relevant image region by trying to reconstruct the input phrase. Natural Language …

Watch What You Just Said: Image Captioning with Text-Conditional Attention
L Zhou, C Xu, P Koch, JJ Corso – … of the on Thematic Workshops of ACM …, 2017 – dl.acm.org
… CCS CONCEPTS Computing methodologies Natural language gen- eration; Neural networks; Computer vision representa- tions; … To overcome it, we introduce an embedding matrix , which contains various text-to-image masks …

TensorLayer: A Versatile Library for Efficient Deep Learning Development
H Dong, A Supratak, L Mai, F Liu… – Proceedings of the …, 2017 – dl.acm.org
… KEYWORDS Deep Learning, Reinforcement Learning, Parallel Computation, Computer Vision, Natural Language Processing, Data Management … A red bird with blue head has grey wings. Text to image synthesis Semantic image transformation …

Person Re-Identification with Vision and Language
F Yan, K Mikolajczyk, J Kittler – arXiv preprint arXiv:1710.01202, 2017 – arxiv.org
… caption-based supervision [11], text-to-image coref- erence [21], zero-shot visual learning using purely textural description [9], and visual question answering [1]. Despite those successes as well as realistic and practical case scenar- ios of using natural language descriptions in …

Framework for Integration of Medical Image and Text-Based Report Retrieval to Support Radiological Diagnosis
S Kulkarni, A Savyanavar, P Kulkarni… – … Signal and Image …, 2017 – books.google.com
… Thus the modality of data is switched from image to text and text to image … The most commonly used weighting scheme is TF× IDF. • Natural Language Processing: The most critical issue for information retrieval performance is the term mismatch …

Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
R Yeh, J Xiong, WM Hwu, M Do… – Advances in Neural …, 2017 – papers.nips.cc
… [23] C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In Proc. CVPR, 2014. [24] J. Krishnamurthy and T. Kollar. Jointly learning to parse and perceive: connecting natural language to the physical world. In Proc …

End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
M Dorfer, J Schlüter, A Vall, F Korzeniowski… – arXiv preprint arXiv …, 2017 – arxiv.org
… Specialized extensions of this loss achieved state-of-the-art results in various domains such as natural language processing (Hermann & Blunsom, 2013), image captioning (Karpathy & Fei-Fei, 2015), and text-to-image retrieval (Vendrov et al., 2016) …

Retrieval of Sentence Sequences for an Image Stream via Coherence Recurrent Convolutional Networks
C Park, Y Kim, G Kim – IEEE transactions on pattern analysis …, 2017 – ieeexplore.ieee.org
… Recently, Hu et al. [33] address a task of natural language object retrieval that takes both an image and associate phrases, and localize bounding box … They propose a latent structural SVM framework to learn the semantic relevance relations from text to image sequences …

Synthesizing Novel Pairs of Image and Text
J Xie, T Bao – arXiv preprint arXiv:1712.06682, 2017 – arxiv.org
… 2.2 Image Captioning On the flip side of text to image, the problem of generating natural language descriptions from images has seen tremendous improvements due to the clever amalgamation of deep convolution networks with recurrent nerual networks …

Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering
M Zhai, J Chen, R Deng, L Chen, L Zhu… – arXiv preprint arXiv …, 2017 – arxiv.org
… Within robotics, work has explored predicting the consequences after interactions between an agent and its environment [7]. In natural language process- ing, approaches [20, 24] have been proposed to tackle tasks … such as text to image or image to text synthesis …

A Novel Approach to Artistic Textual Visualization via GAN
Y Ma, M Ma – arXiv preprint arXiv:1710.10553, 2017 – arxiv.org
… This evaluation can illustrate that the natural language analysis section has a good understanding of the poems, which is a very critical implementation in artistic textual visualization. 3.3 … Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016 …

Overview of Text Annotation with Pictures
JK Poots, E Bagheri – IT Professional, 2017 – ieeexplore.ieee.org
… that pictures are also represented by text, for example, a caption, or a short paragraph, so that the matching exercise means matching input text to image description text … There are additional matching measures based on more formal natural language processing techniques [17 …

A Pedagogical Approach to Emancipate Second Language (L2) Learners and to develop Second Language Reading and Writing Skills
AC Sanchez – ijllnet.com
… Vasallo (2015) says, that interlingual translation “is a transferring of sense from one natural language to another … Loffredo and Perterhella (2014) state that “The change in modality (from written text to image, for example), brings into play other channels in the ‘reinvention’ of, and …

What no robot has seen before—Probabilistic interpretation of natural-language object descriptions
D Nyga, M Picklum, M Beetz – Robotics and Automation (ICRA) …, 2017 – ieeexplore.ieee.org
Page 1. What No Robot Has Seen Before – Probabilistic Interpretation of Natural-language Object Descriptions … Abstract—We investigate the task of recognizing objects of daily use in human environments purely based on object descriptions given in natural language …

Adversarial Discrete Sequence Generation
A Vani – nevitus.com
… distance [Arjovsky et al., 2017]. While many variants of GANs for images have come up, the applications of adversarial generative models have been less prominent in natural language processing. Although MLE works much …

Feature-Matching Auto-Encoders
D Tran, Y Burda, I Sutskever – bayesiandeeplearning.org
… Effective Approaches to Attention-based Neural Machine Translation. In Empirical Methods on Natural Language Processing … [14] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative Adversarial Text to Image Synthesis …

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering
P Lu, H Li, W Zhang, J Wang, X Wang – arXiv preprint arXiv:1711.06794, 2017 – arxiv.org
… The algorithms are required to answer natural language questions about a given image’s contents. Compared with the conventional visual-language tasks such as image cap- tioning and text-to-image retrieval, the VQA task requires the algorithms to have a better …

Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph
M Qi, Y Wang, A Li – Proceedings of the 2017 ACM on Multimedia …, 2017 – dl.acm.org
… Considering that the users always intend to search images by some natural language descriptions or mine text messages by visual data, cross-modal hashing becomes an important problem … Image to text Text to image Figure 1: Overview of proposed framework …

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
T Xu, P Zhang, Q Huang, H Zhang, Z Gan… – arXiv preprint arXiv …, 2017 – arxiv.org
… multi-stage refinement for fine-grained text-to-image gener- ation. With a novel attentional generative network, the At- tnGAN can synthesize fine-grained details at different sub- regions of the image by paying attentions to the relevant words in the natural language description …

Deep Matching Autoencoders
T Mukherjee, M Yamada, TM Hospedales – arXiv preprint arXiv …, 2017 – arxiv.org
… 2.2. Applications Visual Description with Natural Language Generating or matching natural language descriptions for images and videos has recently become a popular topic in cross-modal learning in the last five years [37] …

Cross-Media Retrieval of Tourism Big Data Based on Deep Features and Topic Semantics
Y Li, J Du, Z Lin, L Ye – … Conference on Intelligent Data Engineering and …, 2017 – Springer
… Text mining technologies based on natural language have made great achievements, but we still face the problem of only obtaining limited semantic … The results of different methods of the same attractions, as well as the image-to-text results and text-to-image results from the …

Retinal Microaneurysm Detection Using Clinical Report Guided Multi-sieving CNN
L Dai, B Sheng, Q Wu, H Li, X Hou, W Jia… – … Conference on Medical …, 2017 – Springer
… neural network (MS-CNN) which leverages a small amount of supervised information in clinical reports to identify the potential MA regions via a text-to-image mapping in … 1. 0. 1. 1. Then, we extract the lesion information from the clinical text reports written in the natural language …

A Multiview Approach to Learning Articulated Motion Models
AF Daniele, TM Howard, MR Walter – ttic.edu
… Natural language descriptions provide a flexible and efficient means by which humans can provide complementary information in a weakly supervised manner suitable for a variety of different interactions (eg, demonstrations and remote manipulation) …

Text2Action: Generative Adversarial Synthesis from Language to Action
H Ahn, T Ha, Y Choi, H Yoo, S Oh – arXiv preprint arXiv:1710.05298, 2017 – arxiv.org
… [5] M. Plappert, C. Mandery, and T. Asfour, “Learning a bidirectional mapping between human whole-body motion and natural language us- ing deep … [9] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in Proc …

DyadGAN: Generating Facial Expressions in Dyadic Interactions
Y Huang, SM Khan – Computer Vision and Pattern …, 2017 – openaccess.thecvf.com
… Advances in automated speech recognition and natural language processing have made possible virtual personal assistants such as Apple Siri, Amazon Alexa … information such as labels [23], text [25] and images [14]; GANs based methods have tackled text-to- image/image-to …

Visual-textual Attention Driven Fine-grained Representation Learning
X He, Y Peng – arXiv preprint arXiv:1709.00340, 2017 – arxiv.org
… results. As is known to all, when we describe the object of an image into text via natural language, we only focus on the pivotal characteristics, and rarely pay attention to common characteristics as well as the background areas …

THINK VISUALLY: QUESTION ANSWERING THROUGH VIRTUAL IMAGERY
A Goyal, J Wang, J Deng – 2017 – openreview.net
… Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: Dynamic memory networks for natural language processing … Generative adversarial text to image synthesis …

From computational narrative analysis to generation: a preliminary review
J Valls-Vargas, J Zhu, S Ontañón – … on the Foundations of Digital Games, 2017 – dl.acm.org
… to generate virtual environments for a story to hap- pen [12, 40]; or; using automatically annotated text as templates for natural language generation systems … Other examples include text-to-image systems that extract semantic models from short, simple and limited domain text …

Neural Motifs: Scene Graph Parsing with Global Context
R Zellers, M Yatskar, S Thomson, Y Choi – arXiv preprint arXiv:1711.06640, 2017 – arxiv.org
… motorcycle”). Predicting such graph representations has been shown to improve natural language based image tasks [17, 38, 46] and has the potential to significantly ex- pand the scope of applications for computer vision systems …

Sentiment Classification with Word Attention based on Weakly Supervised Leaning
G Lee, J Jeong, S Seo, CY Kim, P Kang – arXiv preprint arXiv:1709.09885, 2017 – arxiv.org
… It is one of the most active research areas in natural language processing (NLP) and has also been widely studied in data mining … fer these attentions to other machine learning models to solve more complicated tasks such as image cap- tioning or text to image generation (Xu et …

Entity linking across vision and language
AN Venkitasubramanian, T Tuytelaars… – Multimedia Tools and …, 2017 – Springer
… Vision tasks often benefit from the associated text [42] while Natural Language Processing (NLP) tasks benefit from the vision. As an example, consider Fig … [23] where images of furniture (represented as 3D cuboids) are mapped with natural language descriptions of the room …

Dual Learning for Cross-domain Image Captioning
W Zhao, W Xu, M Yang, J Ye, Z Zhao, Y Feng… – Proceedings of the 2017 …, 2017 – dl.acm.org
… capability to describe what it sees. Developing image captioning methodologies connects researches from both the community of computer vision and the community of natural language processing. Its success lies in how to …

3D Abstract Scene Synthesis from Sentences
J Shao – 2017 – escholarship.org
… text-to-image task is based on two subtasks: learning a text feature representation that captures most of the visual details in sentences; synthesizing image pixels from these features. The two subtasks are solved well in previous several years, by natural language processing …

Generative Attention Model with Adversarial Self-learning for Visual Question Answering
I Ilievski, J Feng – Proceedings of the on Thematic Workshops of ACM …, 2017 – dl.acm.org
… KEYWORDS Visual Question Answering, Multimodal Representation, Adversar- ial Learning 1 INTRODUCTION Visual question answering (VQA) is an active research area that lies at the intersection of multimedia, natural language processing, and machine learning …

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
J Gu, J Cai, S Joty, L Niu, G Wang – arXiv preprint arXiv:1711.06420, 2017 – arxiv.org
… Abstract Textual-visual cross-modal retrieval has been a hot re- search topic in both computer vision and natural language processing communities … The cross-modal retrievals (Image-to-Text and Text-to-Image) are shown in different color …

Automatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network
??? – 2017 – s-space.snu.ac.kr
… Among them, We will focus on visual understanding and natural language expression. Various studies have … better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the …

Machines that Learn with Limited or No Supervision: A Survey on Deep Learning Based Techniques
S Roy, S Herath, R Nock, F Porikli – porikli.com
… are successfully applied to obtain lower-dimensional embeddings and demonstrated higher accuracy results in numerous classification and regression tasks across various fields ranging from image understanding, speech recognition, natural language processing, sentiment …

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
F Faghri, DJ Fleet, JR Kiros, S Fidler – 2017 – openreview.net
Page 1. Under review as a conference paper at ICLR 2018 VSE++: IMPROVING VISUAL-SEMANTIC EMBEDDINGS WITH HARD NEGATIVES Anonymous authors Paper under double-blind review ABSTRACT We present a …

Learning Articulated Object Models from Language and Vision
AF Daniele, TM Howard, MR Walter – ttic.edu
… We model linguistic information using a probabilistic lan- guage model that grounds natural language descriptions to their referent kinematic motion … guage model that grounds natural language descriptions into a structured representation of an object’s articulation man- ifold …

Learning Distributions of Meant Color
L White, R Togneri, W Liu, M Bennamoun – arXiv preprint arXiv …, 2017 – arxiv.org
… From this has come the corresponding area in natural language pro- cessing. The ISCC-NBS color system (Kelly et al., 1955) has been key to these developments … R 3 is the scaled HSV color space,3 and T is the natural language space …

Cross-modal Retrieval via Memory Network
G Song, X Tan – parnec.nuaa.edu.cn
… a number of recent efforts [4, 28] have explored ways to use RNNs or LSTM-based models with memory in natural language processing field … In more detail, the state performance is greatly improved by CMMN at least 30% in the case the text to image retrieval, while the case the …

Automatic Medical Image Multilingual Indexation Through a Medical Social Network
MG Ayadi, R Bouslimi, J Akaichi, H Hedhli – Prediction and Inference from …, 2017 – Springer
… Similarly, research in the other direction via text-to-image synthesis [12, 13, 14] has also helped to harvest images, mostly for concrete … According to Frakes and Fox [41], stemming algorithms are used in many applications related to natural language processing such as text …

PixelBrush: Art Generation from text with GANs
J Zhi – cs231n.stanford.edu
… In their work, they proposed a novel two staged approach for text to image synthesis … We will use natural language as in- put, so people can describe what kind of artwork they want, and then our tool PixelBrush will generate an image accord- ing to the description that provided …

A Continuous Approach to Controllable Text Generation using Generative Adversarial Networks
DI Helgøy, M Lund – 2017 – brage.bibsys.no
… 43 vii Page 10. viii LIST OF FIGURES 5.3 Text-to-image GAN architecture … MLE maximum likelihood estimation. 44 MSE mean squared error. 6, 18, 25, 75 NLP natural language processing. 19 PCA principal component analysis. 14–16, 72 xi Page 14. xii Acronyms …

Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network
A Lahiri, A Jain, PK Biswas, P Mitra – arXiv preprint arXiv:1711.06106, 2017 – arxiv.org
… The GAN framework is flexible in accepting different gen- res of conditioning inputs such as class labels [33], natural language description [37], localization information[39] and even an entire image [50, 16] or a sequence of images [29] …

Leveraging Multimodal Perspectives to Learn Common Sense for Vision and Language Tasks
X Lin – 2017 – vtechworks.lib.vt.edu
… (VQA). VQA is the task of answering open-ended natural language questions about im- ages … [75] Visual Question An- swering (VQA) is the task of taking as input an image and a free-form natural language question about the image, and producing an accurate answer …

Measuring and Predicting Tag Importance for Image Retrieval
S Li, S Purushotham, C Chen, Y Ren… – IEEE transactions on …, 2017 – ieeexplore.ieee.org
… subspace between the visual and the textual domains has been built, cross-modality information search, such as text-to-image and image … to measure the object and scene tag importance from human provided sen- tence descriptions based on natural language processing (NLP …

Tree Memory Networks for Modelling Long-term Temporal Dependencies
T Fernando, S Denman, A McFadyen… – arXiv preprint arXiv …, 2017 – arxiv.org
… Weston et al. (2014) have utilised a memory module to improve the accuracy on natural language processing problems. Their proposed memory architecture is not fully extendible considering the usage of an offline feature engineering process using a bag-of-words approach …

Learning a Semantically Discriminative Joint Space for Attribute Based Person Re-identification
Z Yin, WS Zheng, A Wu, HX Yu, H Wang… – arXiv preprint arXiv …, 2017 – arxiv.org
Page 1. Learning a Semantically Discriminative Joint Space for Attribute Based Person Re-identification Zhou Yin Wei-Shi Zheng Ancong Wu Hong-Xing Yu Hai Wan Jianhuang Lai School of Data and Computer Science, Sun …

Sentence Directed Video Object Codiscovery
H Yu, JM Siskind – International Journal of Computer Vision, 2017 – Springer
… While conceptually, such semantic information is typically conveyed by sentences, our focus is on the computer vision task of using such information when processing video, not the natural language processing task of extracting such information from sentences …

Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval
Y Hu, L Zheng, Y Yang, Y Huang – IEEE Transactions on …, 2017 – ieeexplore.ieee.org
Page 1. 1520-9210 (c) 2017 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See http://www.ieee.org/ publications_standards/publications/rights/index.html for more information. This …

Attentive Semantic Video Generation using Captions
T Marwah, G Mittal… – 2017 IEEE International …, 2017 – openaccess.thecvf.com
… of-video frame. This frame marks the beginning of every given video. It contains all 0s resembling the start- of-sentence tag used to identify the beginning of a sentence in Natural Language Processing [23]. 4.1. Results on Generation …

Automated understanding of data visualizations
ST Alsheikh – 2017 – dspace.mit.edu
… image understanding. For example, [33] show that simpler, abstract images (like clip art) can be used in place of natural images to understand the semantic relationship between visual media and their natural language representation. 22 Page 23. Chapter 3 …

Discourse-Level Language Understanding with Deep Learning
MN Iyyer – 2017 – drum.lib.umd.edu
… Designing computational models that can understand language at a human level is a foundational goal in the field of natural language processing (NLP). Given a sentence … foundational goal in the field of natural language processing (NLP). Given a sentence …

Deep Multimodal Learning: A Survey on Recent Advances and Trends
D Ramachandram, GW Taylor – IEEE Signal Processing …, 2017 – ieeexplore.ieee.org
… now interests research- ers in academia, but also industry, and it has resulted in state- of-the-art performance for many practical problems, especially in areas involving high-dimensional unstructured data such as in computer vision, speech, and natural language processing …

Embedding Deep Networks into Visual Explanations
Z Qi, F Li – arXiv preprint arXiv:1709.05360, 2017 – arxiv.org
… 5, 21, 34]. In Natural Language Processing, Kulesza et. al [19] propose … 44, 39, 28]. [24, 28] propose to explain via visual question answering which utilized both natural language descriptions and heatmaps. Ribeiro et. al [27] propose …

Generative models of visually grounded imagination
R Vedantam, I Fischer, J Huang, K Murphy – arXiv preprint arXiv …, 2017 – arxiv.org
… This is similar to the capacity of natural language to “make infinite use of finite means”, in the words of Noam Chomsky (1965). Figure 1: Illustration of a compositional concept hierarchy related to birds, derived from two independent attributes, size and color …

Real-valued (medical) time series generation with recurrent conditional GANs
C Esteban, SL Hyland, G Rätsch – arXiv preprint arXiv:1706.02633, 2017 – arxiv.org
… This approach has been mainly used for image generation tasks [21, 18, 2]. Recently, Conditional GAN architectures have been also used in natural language processing, including translation [27] and dialogue generation [16 … Generative adversarial text to image synthesis …

Deep Learning the Physics of Transport Phenomena
AB Farimani, J Gomes, VS Pande – arXiv preprint arXiv:1709.02432, 2017 – arxiv.org
… In recent years, there have been several advances in the fields of computer vision and natural language processing applications brought on by deep learning.2–5 The convolutional … transfer, texture mapping, text to image translation, image to image translation.16–18 We …

Joint intermodal and intramodal label transfers for extremely rare or unseen classes
GJ Qi, W Liu, C Aggarwal… – IEEE transactions on …, 2017 – ieeexplore.ieee.org
Page 1. Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes Guo-Jun Qi, Member, IEEE, Wei Liu, Charu Aggarwal, Fellow, IEEE, and Thomas Huang, Life Fellow, IEEE Abstract—In this paper …

Long short-term memory recurrent neural networks for classification of acute hypotensive episodes
AS Jaffe – 2017 – dspace.mit.edu
… takes sequential input and produces sequential output by sharing parameters between time steps. RNNs have led to breakthrough results in natural language processing [20], image captioning [13], and speech recognition [7]. Though RNNs have proven …

Pro Deep Learning with TensorFlow
S Pattanayak – Springer
… Chapter 4 — Natural Language Processing Using Recurrent Neural Networks: This chapter deals with natural language processing using … reasoning, semantic segmentation, video generation, style transfer from one domain to another, and text-to-image generation applications …

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning
Y Peng, J Qi, Y Yuan – arXiv preprint arXiv:1710.05106, 2017 – arxiv.org
Page 1. IEEE TRANSACTIONS ON MULTIMEDIA 1 CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning Yuxin Peng, Jinwei Qi and Yuxin Yuan Abstract—It is known that the inconsistent …

A survey on story generation techniques for authoring computational narratives
B Kybartas, R Bidarra – … on Computational Intelligence and AI in …, 2017 – ieeexplore.ieee.org
… have come out of the research community. In Façade, one takes the role of an acquaintance of a married couple and one is able interact with both characters using natural language text input. The plot uses constraints to ensure …

Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation
P Vaidyanathan – 2017 – scholarworks.rit.edu
… images. Automatic semantic image region annotation is the task of computationally identifying image regions that are perceptually meaningful for humans and associating them with appropriate natural language concept labels …

Apprentissage de Modèles pour la Classification et la Recherche d’Images Learning Image Classification and Retrieval Models
F Jurie, C Lampert, B Caputo, C Schmid, J Verbeek… – hal.univ-grenoble-alpes.fr
… Such a codebook does not exist as in natural language1. Then, each sample from an unordered set of samples taken from an image, is assigned to its closest match in the codebook, increasing the word-count of the corresponding visual-word …

Modeling and Mining Domain Shared Knowledge for Sentiment Analysis
GY Zhou, JX Huang – ACM Transactions on Information Systems (TOIS), 2017 – dl.acm.org
… Sentiment analysis refers to the use of natural language processing, text analysis, and com- putational linguistics to identify and extract subjective … (2015) proposed a robust and non-negative collective matrix factorization model to handle noise in text-to-image transfer learning …

Online transfer learning with multiple homogeneous or heterogeneous sources
Q Wu, H Wu, X Zhou, M Tan, Y Xu… – IEEE Transactions on …, 2017 – ieeexplore.ieee.org
Page 1. Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources Qingyao Wu, Hanrui Wu, Xiaoming Zhou, Mingkui Tan, Yonghui Xu, Yuguang Yan, and Tianyong Hao Abstract—Transfer learning techniques …

Nineteenth-Century Illustration and the Digital: Studies in Word and Image
J Thomas – 2017 – books.google.com
… media would lead to a dismantling of the hierarchy that privileges text over image and give rise to more interactions between the two forms: In a civilization marked by the increasing prominence of the visual, we can expect a change in this hierarchical relationship of text to image …

Multimodal Learning for Vision and Language
J Mao – 2017 – search.proquest.com
… For natural language, the Recurrent Neural Network (RNN) shows the state-of-the-art performance in many tasks, such as speech recognition and word embedding learning ([MKB10, MKB11, MSC13]) … Image Retrival (Text to Image). R@1 R@5 R@10 Med r …

A Cost-Sensitive Visual Question-Answer Framework for Mining a Deep And-OR Object Semantics from Web Images
Q Zhang, YN Wu, SC Zhu – arXiv preprint arXiv:1708.03911, 2017 – arxiv.org
… [13] constructed a hierarchical taxonomic relation- ships between categories. [48, 25, 39, 3] formulated the re- lationships between natural language and visual concepts. [1] further built a Turing test system. [16] modeled the con- textual knowledge between objects …

A survey on heterogeneous transfer learning
O Day, TM Khoshgoftaar – Journal of Big Data, 2017 – Springer
… address such limitations. This, in effect, expands the application of transfer learning to many other real-world tasks such as cross-language text categorization, text-to-image classification, and many others. Heterogeneous transfer …

Explainable Recommendations
RC Kanjirathinkal – 2017 – cs.cmu.edu
… Additionally, we propose how explanations could be produced in such a setting by jointly ranking KG entities and items. KGs however operate in the domain of discrete entities, and are therefore limited in their ability to deal with natural language content …

Deep neural architectures for automatic representation learning from multimedia multimodal data
V Vukotic – 2017 – theses.fr
… Deep learn- ing, and thus automatic representation learning, is not only an increasingly popular trend in computer vision. Other fields, such as natural language processing, speech recognition, multimodal retrieval systems and many others follow this trend …

PacGAN: The power of two samples in generative adversarial networks
Z Lin, A Khetan, G Fanti, S Oh – arXiv preprint arXiv:1712.04086, 2017 – arxiv.org
… Further, they implicitly learn a latent, low-dimensional representation of arbitrary high-dimensional data. Such embeddings have been hugely successful in the area of natural language processing (eg word2vec [31]). GANs …

NINETEENTH-CENTURY ILLUSTRATION AND THE DIGITAL
J Thomas – Springer
… In a civilization marked by the increasing prominence of the visual, we can expect a change in this hierarchical relationship of text to image. It is far from certain that coming generations dealing with mixed environments will read the text first, as we so often tend to do …

Fuzzy Information Retrieval
DH Kraft, E Colvin – Synthesis Lectures on Information …, 2017 – morganclaypool.com
… personal information management, human information behavior, digital libraries, archives and preservation, cultural informatics, information retrieval evaluation, data fusion, relevance feedback, recommendation systems, ques- tion answering, natural language processing for …

A scoping review: exploring the world of medical and wellness tourism
HMCNC Rodrigues – 2017 – repositorio.iscte.pt
Page 1. Business Research Unit (BRU-IUL) A Scoping Review: Exploring the World of Medical and Wellness Tourism Helena Maria Correia Neves Cordeiro Rodrigues A Thesis presented in partial fulfillment of the Requirements for the Degree of Doctor in Management …

Domain Adaptive Computational Models for Computer Vision
HKD Venkateswara – 2017 – search.proquest.com
Domain Adaptive Computational Models for Computer Vision. Abstract. The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to …

(Visited 39 times, 1 visits today)