Visual Dialog - Meta-Guide.com

Notes:

Visual dialog is a type of human-computer interaction in which a computer system engages in a conversation with a human user about an image or other visual content. This can involve the system asking questions about the content of the image, and the user providing answers in natural language. The goal of visual dialog is to allow the system to learn more about the image and its contents through conversation with the user.

Visual dialog systems are often used in the field of artificial intelligence and machine learning, as they provide a rich and naturalistic way for systems to learn about visual content. They can also be used in applications such as image search, where users can search for images by providing natural language queries. Visual dialog systems can be implemented using a variety of techniques, such as natural language processing and deep learning.

Goal-oriented visual dialog is a type of visual dialog in which the conversation between the computer system and the user is focused on achieving a specific goal or task. This can involve the system asking questions and the user providing answers in order to gather information or make decisions. The goal of goal-oriented visual dialog is to enable the system to perform a specific task or achieve a specific outcome through conversation with the user.
Image-to-text is a type of natural language processing task in which an artificial intelligence system is trained to generate text descriptions of images. This can involve generating a single sentence or paragraph that summarizes the content of the image, or generating a more detailed description that includes information about the objects, scenes, and other elements depicted in the image.
Visual chatbot is a type of chatbot that uses visual content, such as images or videos, as part of its conversation with the user. This can involve presenting the user with images or videos, and then asking questions or making statements based on the content of the visual material. Visual chatbots can be used in a variety of applications, such as customer service, education, and entertainment.
Visual dialog agent is a type of artificial intelligence system that is designed to engage in conversation with a human user about visual content. This can involve asking questions, providing information, and responding to user input in natural language. Visual dialog agents are often used in applications such as image search, where they can help users find images by providing natural language descriptions of the images.
Visual dialog generation is the process of creating a conversation or dialogue between a computer system and a human user about visual content. This can involve generating questions or statements based on the content of an image or video, and then generating appropriate responses to the user’s answers. Visual dialog generation is an important component of visual dialog systems, as it allows the system to engage in a natural and lifelike conversation with the user.
Visual dialog model is a mathematical or computational model that is used to simulate the processes involved in visual dialog. These models can be trained on large datasets of visual dialog data, in order to learn the patterns and structures that are characteristic of visual dialog. Once trained, the model can be used to generate realistic and naturalistic visual dialog.
Visual dialog system is a software program or system that is designed to facilitate conversation between a human user and a computer about visual content. This can involve presenting the user with images or videos, and then engaging in a conversation about the content of the visual material. Visual dialog systems can be used in a variety of applications, such as image search, customer service, and education.
Visual dialogue refers to a conversation or dialogue between a human user and a computer system about visual content. This can involve the system presenting the user with images or videos, and then engaging in a conversation about the content of the visual material. Visual dialogue can be used to gather information about the visual content, or to enable the user to perform a specific task or achieve a specific outcome.
Visual question answering is a type of natural language processing task in which an artificial intelligence system is trained to answer questions about visual content. This can involve presenting the system with an image and a question in natural language, and then having the system generate an appropriate response. Visual question answering can be used in applications such as image search, where users can search for images by providing natural language queries.

Resources:

visualdialog.org .. agents that can hold dialogs with humans about visual content
visualqa.org .. dataset containing open-ended questions about images

References:

Visual dialog
A Das, S Kottur, K Gupta, A Singh… – Proceedings of the …, 2017 – openaccess.thecvf.com
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground …

Learning cooperative visual dialog agents with deep reinforcement learning
A Das, S Kottur, JMF Moura, S Lee… – Proceedings of the …, 2017 – openaccess.thecvf.com
We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperativeimage guessing’game between two agents–Qbot and Abot–who communicate in natural language dialog so that Qbot can select an unseen …

Visual coreference resolution in visual dialog using neural module networks
S Kottur, JMF Moura, D Parikh… – Proceedings of the …, 2018 – openaccess.thecvf.com
Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus …

Visual reference resolution using attention memory for visual dialog
PH Seo, A Lehrmann, B Han, L Sigal – Advances in neural …, 2017 – papers.nips.cc
Visual dialog is a task of answering a series of inter-dependent questions given an input image, and often requires to resolve visual references among the questions. This problem is different from visual question answering (VQA), which relies on spatial attention ({\em aka …

Best of both worlds: Transferring knowledge from discriminative learning to a generative visual dialog model
J Lu, A Kannan, J Yang, D Parikh… – Advances in Neural …, 2017 – papers.nips.cc
We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses …

Are you talking to me? reasoned visual dialog generation through adversarial learning
Q Wu, P Wang, C Shen, I Reid… – Proceedings of the …, 2018 – openaccess.thecvf.com
Abstract The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the …

Recursive visual attention in visual dialog
Y Niu, H Zhang, M Zhang, J Zhang… – Proceedings of the …, 2019 – openaccess.thecvf.com
Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems:(1) How to answer visually-grounded questions, which is the core challenge in visual question …

Two can play this game: visual dialog with discriminative question generation and answering
U Jain, S Lazebnik, AG Schwing – Proceedings of the IEEE …, 2018 – openaccess.thecvf.com
Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering …

Image-question-answer synergistic network for visual dialog
D Guo, C Xu, D Tao – … of the IEEE Conference on Computer …, 2019 – openaccess.thecvf.com
The image, question (combined with the history for de-referencing), and the corresponding answer are three vital components of visual dialog. Classical visual dialog systems integrate the image, question, and history to search for or generate the best matched answer, and so …

Multi-step reasoning via recurrent dual attention for visual dialog
Z Gan, Y Cheng, AE Kholy, L Li, J Liu, J Gao – arXiv preprint arXiv …, 2019 – arxiv.org
This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image. In each question-answering turn of a dialog, ReDAN infers the answer progressively through …

Flipdial: A generative model for two-way visual dialogue
D Massiceti, N Siddharth… – Proceedings of the …, 2018 – openaccess.thecvf.com
We present FlipDial, a generative model for Visual Dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FlipDial learns …

Clevr-dialog: A diagnostic dataset for multi-round reasoning in visual dialog
S Kottur, JMF Moura, D Parikh, D Batra… – arXiv preprint arXiv …, 2019 – arxiv.org
Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real …

Dual attention networks for visual reference resolution in visual dialog
GC Kang, J Lim, BT Zhang – arXiv preprint arXiv:1902.09368, 2019 – arxiv.org
Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and exploit visually …

Answerer in questioner’s mind for goal-oriented visual dialogue
SW Lee, YJ Heo, BT Zhang – Visually-Grounded Interaction and …, 2018 – bi.snu.ac.kr
We propose an “answerer in questioner’s mind”(AQM) framework, a novel approach for goal-oriented dialogue. In AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer in the questioner’s model. The questioner figures out the …

Visual dialogue without vision or dialogue
D Massiceti, PK Dokania, N Siddharth… – arXiv preprint arXiv …, 2018 – arxiv.org
We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue-a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli. To do so, we develop an embarrassingly simple method …

Multimodal hierarchical reinforcement learning policy for task-oriented visual dialog
J Zhang, T Zhao, Z Yu – arXiv preprint arXiv:1805.03257, 2018 – arxiv.org
Creating an intelligent conversational system that understands vision and language is one of the ultimate goals in Artificial Intelligence (AI)~\cite {winograd1972understanding}. Extensive research has focused on vision-to-language generation, however, limited …

Making history matter: History-advantage sequence training for visual dialog
T Yang, ZJ Zha, H Zhang – Proceedings of the IEEE …, 2019 – openaccess.thecvf.com
We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history. Given a triplet: an image, Q&A history, and current question, all the prevailing methods follow a codec (ie, encoder …

Answerer in questioner’s mind: information theoretic approach to goal-oriented visual dialog
SW Lee, YJ Heo, BT Zhang – Advances in neural information …, 2018 – papers.nips.cc
Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct …

Avatar therapy: an audio-visual dialogue system for treating auditory hallucinations.
M Huckvale, J Leff, G Williams – INTERSPEECH, 2013 – isca-speech.org
This paper presents a radical new therapy for persecutory auditory hallucinations (“voices”) which are most commonly found in serious mental illnesses such as schizophrenia. In around 30% of patients these symptoms are not alleviated by anti-psychotic medication. This …

Audio visual scene-aware dialog (avsd) challenge at dstc7
H Alamri, V Cartillier, RG Lopes, A Das, J Wang… – arXiv preprint arXiv …, 2018 – arxiv.org
… Progress on such systems can be made by integrat- ing state-of-the-art technologies from multiple research ar- eas including end-to-end dialog systems visual dialog,and video description. We introduce the Audio Visual Scene- Aware Dialog (AVSD) challenge and dataset …

A visual dialog augmented interactive recommender system
T Yu, Y Shen, H Jin – Proceedings of the 25th ACM SIGKDD International …, 2019 – dl.acm.org
Traditional recommender systems rely on user feedback such as ratings or clicks to the items, to analyze the user interest and provide personalized recommendations. However, rating or click feedback are limited in that they do not exactly tell why users like or dislike an …

Improving goal-oriented visual dialog agents via advanced recurrent nets with tempered policy gradient
R Zhao, V Tresp – LaCATODA@ IJCAI, 2018 – openreview.net
Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, training text-generating agents efficiently is still a considerable challenge. Commonly used policy-based dialogue agents often end up …

Adaptive Visual Dialog for Intelligent Tutoring Systems
J Ahn, M Chang, P Watson, R Tejwani… – … Conference on Artificial …, 2018 – Springer
Conversational dialog systems are well known to be an effective tool for learning. Modern approaches to natural language processing and machine learning have enabled various enhancements to conversational systems but they mostly rely on text-or speech-only …

Two causal principles for improving visual dialog
J Qi, Y Niu, J Huang, H Zhang – Proceedings of the IEEE …, 2020 – openaccess.thecvf.com
This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial). By” improving”, we mean that they can promote almost every existing VisDial model to the …

New Mexico’s Cuarto Centenario: History in Visual Dialogue
A Fields – The Public Historian, 2011 – online.ucpress.edu
Out of the aftermath of the New Mexico Cuarto Centenario (the four hundredth anniversary of the Spanish explorer Don Juan de Oñate’s 1598 settlement in present-day New Mexico) came a pledge to create a memorial for the conquistador. The memorial was envisioned as …

Ask no more: Deciding when to guess in referential visual dialogue
R Shekhar, T Baumgartner, A Venkatesh… – arXiv preprint arXiv …, 2018 – arxiv.org
Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making …

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue.
X Jiang, J Yu, Z Qin, Y Zhuang, X Zhang, Y Hu, Q Wu – AAAI, 2020 – aaai.org
Abstract Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects, relationships or semantics. The …

Learning goal-oriented visual dialog via tempered policy gradient
R Zhao, V Tresp – 2018 IEEE Spoken Language Technology …, 2018 – ieeexplore.ieee.org
Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this …

The Dual Language of Geometry in Gothic Architecture: The Symbolic Message of Euclidian Geometry versus the Visual Dialogue of Fractal Geometry
NS Ramzy – Peregrinations: Journal of Medieval Art and …, 2015 – digital.kenyon.edu
When performing geometrical analysis of historical buildings, it is important to keep in mind what were the intentions of the originators, even though these intentions have likely changed many times as the master masons changed. For the medieval builders, geometry in …

Large-scale pretraining for visual dialog: A simple state-of-the-art baseline
V Murahari, D Batra, D Parikh, A Das – arXiv preprint arXiv:1912.02379, 2019 – arxiv.org
Prior work in visual dialog has focused on training deep neural models on the VisDial dataset in isolation, which has led to great progress, but is limiting and wasteful. In this work, following recent trends in representation learning for language, we introduce an approach to …

Modality-Balanced Models for Visual Dialogue.
H Kim, H Tan, M Bansal – AAAI, 2020 – aaai.org
Abstract The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only …

Dual Visual Attention Network for Visual Dialog.
D Guo, H Wang, M Wang – IJCAI, 2019 – pdfs.semanticscholar.org
Visual dialog is a challenging task, which involves multi-round semantic transformations between vision and language. This paper aims to address cross-modal semantic correlation for visual dialog. Motivated by that Vg (global vision), Vl (local vision), Q (question) and H …

Examining cooperation in visual dialog models
M Mironenco, D Kianfar, K Tran, E Kanoulas… – arXiv preprint arXiv …, 2017 – arxiv.org
In this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components. Concretely, we conduct structured or randomized interventions that aim to impair an individual …

Context, attention and audio feature explorations for audio visual scene-aware dialog
SH Kumar, E Okur, S Sahay, JJA Leanos… – arXiv preprint arXiv …, 2018 – arxiv.org
… tion Answering (VQA) (Antol et al. 2015) further extended the Video Captioning and Neural Machine Translation work and incorporated question-to-answer generation via end-to- end encoder-decoder framework. Visual Dialog task (Das et al … Visual dialog …

Efficient attention mechanism for handling all the interactions between many inputs with application to visual dialog
VQ Nguyen, M Suganuma, T Okatani – arXiv preprint arXiv:1911.11390, 2019 – arxiv.org
It has been a primary concern in recent studies of vision and language tasks to design an effective attention mechanism dealing with interactions between the two modalities. The Transformer has recently been extended and applied to several bi-modal tasks, yielding …

Improving generative visual dialog by answering diverse questions
V Murahari, P Chattopadhyay, D Batra, D Parikh… – arXiv preprint arXiv …, 2019 – arxiv.org
Prior work on training generative Visual Dialog models with reinforcement learning (Das et al.) has explored a Qbot-Abot image-guessing game and shown that this’ self-talk’approach can lead to improved performance at the downstream dialog-conditioned image-guessing …

14 Images of Doctors and their Implements: A Visual Dialogue between the Patient and the Doctor
PA Baker – Homo Patiens-Approaches to the Patient in the Ancient …, 2016 – brill.com
Images of physicians, patients, and medical instruments were placed on Graeco-Roman funerary monuments, altars and fresco paintings. These representations are examined here to determine whether there existed a standard convention by which physicians were …

What Should I Ask? Using Conversationally Informative Rewards for Goal-Oriented Visual Dialog
P Shukla, C Elmadjian, R Sharan, V Kulkarni… – arXiv preprint arXiv …, 2019 – arxiv.org
The ability to engage in goal-oriented conversations has allowed humans to gain knowledge, reduce uncertainty, and perform tasks more efficiently. Artificial agents, however, are still far behind humans in having goal-driven conversations. In this work, we focus on the …

A simple baseline for audio-visual scene-aware dialog
I Schwartz, AG Schwing… – Proceedings of the IEEE …, 2019 – openaccess.thecvf.com
… vision, a tremendous amount of recent work has focused on image captioning [68, 30, 11, 16, 75, 45, 77, 31, 69, 4, 15, 10], visual question generation [36, 48, 47, 28], visual question answering [5, 19, 59, 54, 44, 73, 74, 76, 57, 58, 49, 50], and very recently visual dialog [13, 14 …

Response to” Visual Dialogue without Vision or Dialogue”(Massiceti et al., 2018)
A Das, D Parikh, D Batra – arXiv preprint arXiv:1901.05531, 2019 – arxiv.org
In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation. This article intends to rebut the …

Generative Visual Dialogue System via Weighted Likelihood Estimation.
H Zhang, S Ghosh, LP Heck, S Walsh, J Zhang… – IJCAI, 2019 – ijcai.org
The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow. Traditional Maximum Likelihood Estimation-based methods only learn from positive responses but …

Iterative Context-Aware Graph Inference for Visual Dialog
D Guo, H Wang, H Zhang, ZJ Zha… – Proceedings of the …, 2020 – openaccess.thecvf.com
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with sparse contexts and unknown graph structure (relation …

Generative visual dialogue system via adaptive reasoning and weighted likelihood estimation
H Zhang, S Ghosh, L Heck, S Walsh, J Zhang… – arXiv preprint arXiv …, 2019 – arxiv.org
The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow. Traditional Maximum Likelihood Estimation (MLE)-based methods only learn from positive responses …

Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation
SW Lee, T Gao, S Yang, J Yoo, JW Ha – arXiv preprint arXiv:1902.08355, 2019 – arxiv.org
Answerer in Questioner’s Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic …

Revolution versus Counter-Revolution: the People’s Party and the Royalist (s) in visual dialogue
T Chotpradit – 2016 – bbktheses.da.ulcc.ac.uk
The People’s Party (Khana ratsadon) or the monarchy: which one is the true begetter of Thai democracy? The people or the King: who possesses sovereign power in Thailand? The thesis Revolution versus Counter-Revolution: The People’s Party and the Royalist (s) in …

Image understanding for visual dialog
Y Cho, I Kim – Journal of Information Processing Systems, 2019 – jips-k.org
This study proposes a deep neural network model based on an encoder–decoder structure for visual dialogs. Ongoing linguistic understanding of the dialog history and context is important to generate correct answers to questions in visual dialogs followed by questions …

Visual Dialogue State Tracking for Question Generation.
W Pang, X Wang – AAAI, 2020 – aaai.org
GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for …

Granular Multimodal Attention Networks for Visual Dialog
BN Patro, S Patel, VP Namboodiri – arXiv preprint arXiv:1910.05728, 2019 – arxiv.org
Vision and language tasks have benefited from attention. There have been a number of different attention models proposed. However, the scale at which attention needs to be applied has not been well examined. Particularly, in this work, we propose a new method …

Ensemble based discriminative models for Visual Dialog Challenge 2018
S Agarwal, R Goyal – arXiv preprint arXiv:2001.05865, 2020 – arxiv.org
This manuscript describes our approach for the Visual Dialog Challenge 2018. We use an ensemble of three discriminative models with different encoders and decoders for our final submission. Our best performing model on’test-std’split achieves the NDCG score of 55.46 …

End-to-end audio visual scene-aware dialog using multimodal attention-based video features
C Hori, H Alamri, J Wang, G Wichern… – ICASSP 2019-2019 …, 2019 – ieeexplore.ieee.org
… As a further step towards conversational visual AI, the new task of visual dialog was introduced [7], in which an AI agent holds a meaningful dialog with humans about a static image using natural, conversational language [8]. While VQA and visual dialog take significant steps …

Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings
D Schlangen – arXiv preprint arXiv:1908.11279, 2019 – arxiv.org
Where early work on dialogue in Computational Linguistics put much emphasis on dialogue structure and its relation to the mental states of the dialogue participants (eg, Allen 1979, Grosz & Sidner 1986), current work mostly reduces dialogue to the task of producing at any …

Painting words: Severn’s visual dialogue with Keats in The Fountain (1828)
GF Scott – Word & Image, 2015 – Taylor & Francis
Abstract Although the painter Joseph Severn left little in the way of textual commentary on Keats’s poetry in his extensive body of letters and memoirs, he did offer a rich analysis and response to the poet’s verse in his artwork, especially his portraits and his best early …

A visual dialogue on ‘healthy’human embryos from the sixteenth to the twenty-first centuries
L McTavish – The ‘Healthy’Embryo. Social, Biomedical, Legal and …, 2010 – academia.edu
What can an art historian contribute to current debates about embryos? As a specialist of seventeenth-century French visual culture, I normally analyse paintings, sculptures and engraved prints. Though these prints sometimes feature medical images of the unborn, they …

Audio visual scene-aware dialog
H Alamri, V Cartillier, A Das, J Wang… – Proceedings of the …, 2019 – openaccess.thecvf.com
… unnatural. Our focus in AVSD is on settings involving multiple rounds of questions that require natural free-form answers. Visual Dialog: Our work is directly related to the image- based dialog task (VisDial) introduced by Das et al. [6 …

Audio visual scene-aware dialog (avsd) track for natural language generation in dstc7
H Alamri, C Hori, TK Marks, D Batra… – DSTC7 at AAAI2019 …, 2018 – workshop.colips.org
… As a further step towards conversational visual AI, the new task of visual dialog was introduced (Das et al … 2017). While VQA and visual dialog take significant steps towards human- machine interaction, they only consider a single static im- age …

Audio Visual Scene-Aware Dialog Track in DSTC8
C Hori, A Cherian, TK Marks, F Metze – DSTC Track Proposal, 2018 – workshop.colips.org
… to better handle flexible conversations by enabling model training on large conversational datasets [1, 2, 3]. In the field of computer vision, interaction with humans about visual information has been explored in visual question answering (VQA) by [4] and visual dialog by [5 …

Visual Reference Resolution using Attention Memory for Visual Dialog
P Hongsuck Seo, A Lehrmann, B Han, L Sigal – arXiv, 2017 – ui.adsabs.harvard.edu
Visual dialog is a task of answering a series of inter-dependent questions given an input image, and often requires to resolve visual references among the questions. This problem is different from visual question answering (VQA), which relies on spatial attention (aka visual …

Enhanced Visual Dialog
G Singh, M Bajaj, S Khandelwal – gursimar.github.io
We focus on the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, a dialog history, and a question about the image, the agent has to ground the …

Visual Dialog: Towards Communicative Visual Agents
S Kottur – 2019 – kilthub.cmu.edu
Recent years have seen significant advancements in artificial intelligence (AI). Still, we are far from intelligent agents that can visually perceive their surroundings, reason, and interact with humans in natural language, thereby being an integral part of our lives. As a step …

Visual Dialog with Targeted Objects
Q Wang, Y Han – … Conference on Multimedia and Expo (ICME), 2019 – ieeexplore.ieee.org
Visual Dialog aims to exchange information of the image between a questioner and an answerer through asking and answering questions alternately. To generate an accurate response to the target question requires to understand the visual information of the image …

Textual-Visual Reference-aware Attention Network for Visual Dialog
D Guo, H Wang, S Wang… – IEEE Transactions on …, 2020 – ieeexplore.ieee.org
Visual dialog is a challenging task in multimedia understanding, which requires the dialog agent to answer a series of questions that are based on an input image. The critical issue to produce an exact answer is how to model the mutual semantic interaction among feature …

Visual Dialog with Multi-turn Attentional Memory Network
D Kong, F Wu – Pacific Rim Conference on Multimedia, 2018 – Springer
Visual dialog is a task of answering a question given an input image, a historical dialog about the image and often requires to retrieve visual and textual facts about the question. This problem is different from visual question answering (VQA), which only relies on visual …

Multi-View Attention Networks for Visual Dialog
S Park, T Whang, Y Yoon, H Lim – arXiv preprint arXiv:2004.14025, 2020 – arxiv.org
Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (eg, question, dialog history, image, and …

Strangely Familiar: A Visual Dialogue
M Hellis – 2012 – digital.lib.washington.edu
How we see defines what we see: vision is organized by the conditions of our existence, by history and by context. For myself, my journeys in and through photography are in some ways liberated by this knowledge, in the sense that if I bring an awareness of this …

Guessing State Tracking for Visual Dialogue
W Pang, X Wang – arXiv preprint arXiv:2002.10340, 2020 – arxiv.org
The Guesser plays an important role in GuessWhat?! like visual dialogues. It locates the target object in an image supposed by an oracle oneself over a question-answer based dialogue between a Questioner and the Oracle. Most existing guessers make one and only …

DialGraph: Sparse Graph Learning Networks for Visual Dialog
GC Kang, J Park, H Lee, BT Zhang, JH Kim – arXiv preprint arXiv …, 2020 – arxiv.org
Visual dialog is a task of answering a sequence of questions grounded in an image utilizing a dialog history. Previous studies have implicitly explored the problem of reasoning semantic structures among the history using softmax attention. However, we argue that the …

The Short Story and the Verbal-Visual Dialogue
CL Rallo, M Laura, M Isabel – … to English and American Studies in Spain, 2014 – aedean.org
This round table aims at examining the possibilities the short story offers for verbalvisual dialogue. Apart from approaching the generic implications of such a dialogue for short fiction using the work of AS Byatt as a referent, the round table explores the transformation of the …

Towards Environment Aware Social Robots using Visual Dialog
A Singh, M Ramanathan, R Satapathy… – aalind0.github.io
State of the art social robots are limited in their ability to understand their environment and have a meaningful conversation based on it. Visual Dialog is a research field that combines computer vision and natural language processing techniques to achieve visual awareness …

Probabilistic framework for solving Visual Dialog
BN Patro, VP Namboodiri – arXiv preprint arXiv:1909.04800, 2019 – arxiv.org
In this paper, we propose a probabilistic framework for solving the task ofVisual Dialog’. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been …

Visual Dialog for Radiology: Data Curation and FirstSteps.
O Kovaleva, C Shivade, S Kashyap… – ViGIL …, 2019 – vigilworkshop.github.io
Recent work in clinical AI has been focusing on solving tasks that involve both image understanding and reading comprehension. In this study, we further pursue this line of research and introduce the first Visual Dialog task in Radiology, which adds complexity to …

ORD: Object Relationship Discovery for Visual Dialogue Generation
Z Wang, Z Huang, Y Luo, H Lu – arXiv preprint arXiv:2006.08322, 2020 – arxiv.org
With the rapid advancement of image captioning and visual question answering at single-round level, the question of how to generate multi-round dialogue about visual content has not yet been well explored. Existing visual dialogue methods encode the image into a fixed …

Adding object detection skills to visual dialogue agents
G Bani, D Belli, G Dagan, A Geenen… – Proceedings of the …, 2018 – openaccess.thecvf.com
Our goal is to equip a dialogue agent that asks questions about a visual scene with object detection skills. We take the first steps in this direction within the GuessWhat?! game. We use Mask R-CNN object features as a replacement for ground-truth annotations in the …

Recurrent Attention Network with Reinforced Generator for Visual Dialog
H Fan, L Zhu, Y Yang, F Wu – ACM Transactions on Multimedia …, 2018 – hehefan.github.io
In Visual Dialog, an agent has to parse temporal context in the dialog history and spatial context in the image to hold a meaningful dialog with humans. For example, to answer “what is the man on her left wearing?”, the agent needs to: 1) analyze the temporal context in the dialog history …

Building Common Ground in Visual Dialogue: The PhotoBook Task and Dataset
J Haber, E Bruni, R Fernández – semdial.org
The past few years have seen an increasing interest in developing computational agents for visuallygrounded dialogue, the task of using natural language interaction to communicate about visual content. Current challenges include posing and answering questions about a …

VISUAL DIALOG MANAGER: A TOOL FOR AUTHORING CHATBOTS
MI Oladele, MSB Sunar, A David – researchgate.net
Conversational agents or Chatbots enables communication through short typed interactions as well as the tendency of carrying on several asynchronous conversations at the same time, which is naturally comfortable for humans. However, the process of creating chatbots …

Attention Memory for Locating an Object through Visual Dialogue
C Han, Y Heo, W Kang, J Jun, BT Zhang – bi.snu.ac.kr
GuessWhat?! is a cooperative guessing game where two agents interact through language in visual contexts. The goal of the game is to locate a target object in a rich scene through dialogue, consisting of question-answering. In this paper, we propose a questioner model …

Deep Reinforcement Learning for Visual Dialogue Agents
Y Cho, J Hwang, I Kim – Proceedings of the Korea Information …, 2018 – koreascience.or.kr

A Revised Generative Evaluation of Visual Dialogue
D Massiceti, V Kulharia, PK Dokania… – arXiv preprint arXiv …, 2020 – arxiv.org
Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge. The current evaluation scheme of the VisDial dataset computes the ranks of ground-truth answers in predefined candidate sets …

Visual dialogue: A drawn conversation about the city of Fez
S Jamouchi – 2017 – 3.121.149.17
The visual essay with moving images, see the link below to the video “Visual Dialogue Fez”, is an invitation to the viewer to approach the urban experience given by the female participant of this project. She narrates and at the same time draw the city of Fez, which she …

Towards Hands-Free Visual Dialog Interactive Recommendation
T Yu, Y Shen, H Jin – Proceedings of the AAAI Conference on Artificial …, 2020 – aaai.org
With the recent advances of multimodal interactive recommendations, the users are able to express their preference by natural language feedback to the item images, to find the desired items. However, the existing systems either retrieve only one item or require the user …

Multi-Modal fusion with multi-level attention for Visual Dialog
J Zhang, Q Wang, Y Han – Information Processing & Management, 2019 – Elsevier
Given an input image, Visual Dialog is introduced to answer a sequence of questions in the form of a dialog. To generate accurate answers for questions in the dialog, we need to consider all information of the dialog history, the question, and the image. However, existing …

Interpreting Decision-Making in Interactive Visual Dialogue
U Sharma – 2018 – esc.fnwi.uva.nl
Dialogue systems that involve long-term planning can strongly benefit from a high-level notion of dialogue strategy and can avoid making poor decisions early in the game and opt for broadly successful strategies instead. A strategy-signal can additionally be used as a …

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation
M Zhou, J Arnold, Z Yu – arXiv preprint arXiv:1909.05365, 2019 – arxiv.org
Reinforcement learning (RL) is an effective approach to learn an optimal dialog policy for task-oriented visual dialog systems. A common practice is to apply RL on a neural sequence-to-sequence (seq2seq) framework with the action space being the output vocabulary in the …

Integration of Visual Perception in Dialogue Understanding for Virtual Humans in Multi-Party interaction.
D Traum, LP Morency – International Workshop on Interacting with ECAs …, 2010 – cs.huji.ac.il
… 6.3 Proof of Concept Validations While we have not yet had a chance to do a full evaluation of the impact that the inclusion of visual dialogue act recognition has on a user’s negotiation experience, we have done preliminary testing on several case studies that show …

History for Visual Dialog: Do we really need it?
S Agarwal, T Bui, JY Lee, I Konstas… – arXiv preprint arXiv …, 2020 – arxiv.org
Visual Dialog involves” understanding” the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we show that co-attention models …

A worthwhile dialogue? Sanda Miller assesses the fruits of a visual dialogue between two fashion titans
S Miller – Apollo, 2012 – go.gale.com
Elsa Schiaparelli, writes Judith Thurman in her introduction to this fine book,’didn’t want to please. She wanted to dominate.’Miuccia Prada, meanwhile,’dominates the runway and the fashion press season after season by pleasing herself.’Thus with a’dominatrix’on one side …

Efficient Visual Dialog Policy Learning via Positive Memory Retention
R Zhao, V Tresp – nips2018vigil.github.io
This paper is concerned with the training of recurrent neural networks as goaloriented visual dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in …

Sensuous Ethnography: A Visual Dialogue with Iranian Transmigrants
S Javdani – Visual Anthropology, 2016 – Taylor & Francis
Focusing on the growing trend of skilled emigration in Iran, which is often examined using a conventional textual discourse, I aim to develop a model that is multilingual and interdisciplinary to create a context that allows a deeper understanding of the sensorial …

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
F Chen, F Meng, J Xu, P Li, B Xu, J Zhou – arXiv preprint arXiv:1912.08360, 2019 – aaai.org
Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate …

Learning goal-oriented visual dialog agents: Imitating and surpassing analytic experts
YW Chang, WH Peng – 2019 IEEE International Conference on …, 2019 – ieeexplore.ieee.org
This paper tackles the problem of learning a questioner in the goal-oriented visual dialog task. Several previous works adopt model-free reinforcement learning. Most pretrain the model from a finite set of human-generated data. We argue that using limited …

The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding
Y Yao, J Xu, B Xu – Proceedings of the 2019 Conference of the North …, 2019 – aclweb.org
Visual Dialog is a multi-modal task that requires a model to participate in a multi-turn human dialog grounded on an image, and generate correct, human-like responses. In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for …

CREATING AUDIO VISUAL DIALOGUE TASK AS STUDENTS’SELF ASSESSMENT TO ENHANCE THEIR SPEAKING ABILITY
N Trisanti – UNNES International Conference on ELTLT, 2012 – eltlt.proceedings.id
The study is about giving overview of employing audio visual dialogue task as students creativity task and self assessment in EFL speaking class of tertiary education to enhance the students speaking ability. The qualitative research was done in one of the speaking …

The coloniality of Seeing: Towards a new inter-epistemic visual dialogue
J Barriendos – Nómadas, 2011 – scielo.org.co
* El presente artículo es una reelaboración de algunos de los materiales que utilicé para impartir el seminario “La colonialidad del ver: la invención del canibalismo de Indias y los imaginarios visuales trasatlánticos de la modernidad/colonialidad”. Dicho seminario se llevó a cabo durante …

The Devil is in the Details: A Magnifying Glass for the GuessWhich Visual Dialogue Game
A Testoni, R Shekhar, R Fernández, R Bernardi – semdial.org
Grounded conversational agents are a fascinating research line on which important progress has been made lately thanks to the development of neural network models and to the release of visual dialogue datasets. The latter have been used to set visual dialogue …

Malick’s Poetics of Gaze. A Visual Dialogue with Thoreau via Cavell
P Alzola – Film-Philosophy Conference 2017, 2017 – film-philosophy.com
Abstract Terrence Malick’s The Thin Red Line (1998) presents significant deviations from James Jones’s 1962 novel on which it is based. Most remarkable among these are the film’s philosophical concerns, hinged on the dramatic construction of Witt (Jim Caviezel) and his …

Accessing National Bibliographic Data in Visual Dialog with Biographic Data
Y Sommerland – 2016 – library.ifla.org
Strategy development and technical solutions for exploiting the full potential of national bibliographic data are central to ongoing efforts in efficiently meeting goals at national libraries. This paper present methods and models for evaluating the usefulness of national …

A Visual Dialogue: What are the Inter Relational Dynamics of Grief?
B Sanstrom – THE INTERNATIONAL, 2012 – researchgate.net
In contemporary society, grief is a universal and multi-faceted human response to significant personal change or loss such as the death of a loved one, separation, or divorce. This paper will argue that inter-relational dynamics of grief can be communicated visually through …

Design Comments; a visual, dialogue-based approach to facilitate transdisciplinary collaboration and holistic solutions in climate change adaptation projects
KM Wiberg – … conference: Working together to prepare for …, 2019 – adk.elsevierpure.com
Results The case study insights were conceptualised into ´Design Comments´, as a method to mediate different bodies of knowledge and seek holistic CCA solutions in transdisciplinary contexts. The Design Comments consist of four elements; (1) the production and use of landscape …

Participatory Video and the Pacifica Mamas: exploring visual dialogue as an enabler for social and economic change
E Papoutsaki, M Williams, C Davis, S Kailahi, M Naqvi – 2014 – survey.unitec.ac.nz
This digital work is protected by copyright. It may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use: Any use you make of these documents or images must be for research or private study purposes only, and you may not make them available …

Visual dialogue: learning journeys, 21st century skills, and arts-based research in the university art studio
LE Monsivais – 2017 – tamucc-ir.tdl.org
Alignment between liberal arts and professional preparation has a distinct, if not complex, historical relationship. However, little is known about how university art professors engage students in learning experiences that are situated in contemporary conditions of …

Computerising Leonardo: a visual dialogue from 1988 to now
M Kemp – 2017 – pl02.donau-uni.ac.at
I will begin with an excursus of computer vision techniques for exploring space in Renaissance paintings. I will then be looking at successive attempts in exhibitions and at one CD-ROM to use the dynamics of computer graphics to realise Leonardo’s underlying …

Looking at Botong Francisco from the Horizon of Diego Rivera: A Visual Dialogue between Two Modern Muralists
FA Demeterio III – Philippiniana sacra, 2013 – academia.edu
This paper explored the visual aesthetics of Carlos Francisco in a dialogical manner using Diego Rivera’s visual aesthetics as a point of reference. In particular this paper comparatively analyzed Francisco’s “Filipino Struggles through History” and Rivera’s “Man …

The art of complexity: using visual artefacts and dialogue to bridge the gap between strategic plans and local actions in organisations
J Burton, S Mockett – … of Research Methods in Complexity Science, 2018 – elgaronline.com
… people responsible for delivering it. We will also show how we have used the Visual Dialogue process as an Organisational Development intervention to address some of the key aspects of these challenges. As the process has …

Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog.
C Hori, A Cherian, TK Marks, T Hori – INTERSPEECH, 2019 – merl.com
… to-end approaches have also been shown to better handle flexible conversations between the user and the system by training the model on large conversational datasets [1, 2]. Using such end-to-end frameworks, visual ques- tion answering (VQA) [3] and visual dialog [4] have …

Chat-crowd: A dialog-based platform for visual layout composition
P Cascante-Bonilla, X Yin, V Ordonez… – arXiv preprint arXiv …, 2018 – arxiv.org
… arXiv preprint arXiv:1811.12354. Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2 …

Multimodal dialog for browsing large visual catalogs using exploration-exploitation paradigm in a joint embedding space
I Bhattacharya, A Chowdhury, VC Raykar – Proceedings of the 2019 on …, 2019 – dl.acm.org
… 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol … 2017. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. arXiv preprint arXiv:1703.06585 (2017) …

Dialogue & reflective visual journaling: A means to empower pre-service art educators
RD Bradshaw – 2014 – gildedgreen.com
… exchange, community and voice—that of the student (Skidmore, 2005). Establishing a visual journaling practice cre- ated spaces for students and I to engage in visual dialogue. Having time to process, reflect, and critique the course …

Audio Visual Scene-Aware Dialog System Using Dynamic Memory Networks
H Xie, I Iacobacci – researchgate.net
… Das et al. (2017) in- troduced an additional visual input to dialog systems and proposed the Visual Dialog (VisDial) problem which asks a model to carry out conversations about a static image … Visual dialog. In CVPR, 326–335 …

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog
SH Kumar, E Okur, S Sahay, J Huang… – arXiv preprint arXiv …, 2019 – arxiv.org
… [4] Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Jose MF Moura, Devi Parikh, and Dhruv Batra. Visual dialog. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017 …

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog
SH Kumar, E Okur, S Sahay, J Huang… – arXiv preprint arXiv …, 2019 – arxiv.org
… cc/paper/2070-latent-dirichlet-allocation. [6] A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, JMF Moura, D. Parikh, and D. Batra. Visual dialog. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017. doi: 10.1109/cvpr.2017.121 …

Verbal-Visual Intertextuality: How do Multisemiotic Texts Dialogue?/Intertextualidade verbo-visual: como os textos multissemióticos dialogam?
L Mozdzenski – academia.edu
… In spite of building very different female identities, the music video Cherish and the film At Land establish a visual dialogue, at least implicitly – since neither Madonna nor the director Herb Ritts have acknowledged any influence of Maya Deren?s work. In …