Scene Understanding & Dialog Systems


Scene understanding is the ability of a system to perceive, analyze, and understand the contents and context of a scene or environment. Scene understanding can involve recognizing and interpreting objects, people, and events in a scene, as well as understanding the relationships between them and the context in which they occur.

Scene understanding is related to dialog systems in that it can be used to improve the ability of a dialog system to understand and respond to user input. For example, if a dialog system is able to understand the context and contents of a scene, it may be able to generate more accurate and appropriate responses to user queries or requests.

For example, a scene understanding system might be used in a smart home application to recognize and interpret the objects and people in a room, as well as the relationships between them. This information could be used by a dialog system to generate more accurate and appropriate responses to user queries or requests, such as “Turn on the lights” or “What is the temperature in the living room?”

  • Conversation system is a software system that is designed to engage in natural language conversations with humans. Conversation systems can be used for a variety of purposes, such as answering questions, providing information, and engaging in dialog with users.
  • Conversational system is a software system that is designed to engage in natural language conversations with humans. Conversational systems can be used for a variety of purposes, such as answering questions, providing information, and engaging in dialog with users.
  • Emotional dialog generation is the process of generating natural language dialog that conveys emotional content or context. This can involve generating dialog that reflects the emotional state of the speaker or the intended emotional impact of the dialog on the listener.
  • Neural conversational agent is a type of conversation system that uses artificial neural networks to generate natural language responses to user input. Neural conversational agents are trained on large datasets of human-generated dialog and are designed to mimic the way that humans engage in natural language conversations.
  • Visual dialog is a type of conversation system that uses visual information, such as images or videos, in addition to natural language dialog to communicate with users. Visual dialog systems can be used for a variety of purposes, such as answering questions about images or videos, providing information about visual content, or engaging in dialog with users based on visual information.



See also:

100 Best GraphLab VideosText-to-Image Systems

Human-centered autonomous vehicle systems: Principles of effective shared autonomy
L Fridman – arXiv preprint arXiv:1810.01835, 2018 –
… In the case of fully autonomous vehicle undergoing testing today, machine learning is primarily used for the scene understanding problem but not for … For communication, we adjust the natural language dialogue system the vehicle uses to inform the driver about changes in risk …

Gap after the next two vehicles: A Spatio-temporally Situated Dialog for a Cooperative Driving Assistant
M Heckmann, D Orth, D Kolossa – … Communication; 13th ITG …, 2018 –
… Scene understanding: To be able to handle occlusions of the vehicles approaching from the right by the vehi- cles passing from the left side we … In this work we have presented a spatio-temporally situated dialog system, which makes references to quickly moving objects …

Analysis of a speech-based intersection assistant in real urban traffic
D Orth, N Steinhardt, B Bolder, M Dunn… – 2018 21st …, 2018 –
… for the analysis, we applied the following approach: We used the position which was delivered from the scene understanding component and … Three types of announcements could occur in the context of the previously described dialog system: Announce No Vehicle means that …

Towards Building Large Scale Multimodal Domain-Aware Conversation Systems
A Saha, MM Khapra, K Sankaranarayanan – Thirty-Second AAAI …, 2018 –
… et al. 2017; Das et al. 2016) involving a sequence of QA pairs with a single image forming a dialog, and the work of (de Vries et al. 2016) which focuses on scene understanding and reasoning from a single image. There are …

An Approach For Instant Conversion of Sensory Data of a Simulated Sensor of a Mobile Robot into Semantic Information
NTM Saeed, M Fathi, KD Kuhnert – 2018 IEEE International …, 2018 –
… They claim that the semantic mapping system results in real-time scene understanding, object detection, and recognition … RDF statements, the collected statements are used for a semantic knowledge base which is the basic building block needed for a dialog system in a natural …

Emotional Dialogue Generation using Image-Grounded Language Models
B Huber, D McDuff, C Brockett, M Galley… – Proceedings of the 2018 …, 2018 –
… difficult to interpret. In recent years, significant advances in computer vision have led to marked improvement in the state-of-the-art in object de- tection, scene understanding, and facial and body analysis [24]. We leverage these …

Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders
Y Wang, C Liu, M Huang, L Nie – arXiv preprint arXiv:1805.04843, 2018 –
… Thus, this task generally requires scene understanding to imagine and com- prehend a scenario (eg, dining at a restaurant) that can … Traditional question generation can be seen in task-oriented dialogue system (Curto et al., 2012), sentence transformation (Vanderwende, 2008 …

Construction of a voice driven life assistant system for visually impaired people
R Chen, Z Tian, H Liu, F Zhao… – … Conference on Artificial …, 2018 –
… In Table II, we can see that the whole dialogue system can recognize and extract the intent and entities from users’ voice with a well accuracy. Figure 7. Effect of object detection algorithm … (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding …

Emotion-awareness for intelligent vehicle assistants: A research agenda
HJ Vögel, C Süß, T Hubregtsen, E André… – 2018 IEEE/ACM 1st …, 2018 –
… Cognitive Model of affective user state Cognitive Situational Model and Interior Scene Understanding Co ntextual M odel Vehicle Context (Sensory Data) World Context (IoT Data, Exterior Scene) … Assistive dialogue systems have already employed such techniques …

Combination of Semantic Localization and Conversational Skills for Assistive Robots
D González-Medina, C Romero-González… – Workshop of Physical …, 2018 – Springer
… The robot has to be able to perform visual scene understanding as well as speech recognition and synthesis … that the recognition module of the objects shows promising results but the results regarding colour extraction need to be improved to be useful in the dialogue system …

A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces
J Zacharias, M Barz, D Sonntag – arXiv preprint arXiv:1803.04818, 2018 –
… In [19, 54] a deep reinforcement learning system for optimis- ing a visually grounded goal-directed dialogue system was implemented using TensorFlow … Backend Visual scene understanding is an important property of an IUI which needs to process image or video input …

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions
A Magassouba, K Sugiura, H Kawai – arXiv preprint arXiv:1806.03847, 2018 –
… Conventionally, this instruction can be disambiguated from a dialogue system, but at the cost of time and cumbersome interaction. Instead, we propose a multimodal approach, in which the instructions are disambiguated using the robot’s state and environment context …

Intelligent Human Computer Interaction
US Tiwary – Springer
… track was dedi- cated to exploring the area of natural language processing, eg, natural language generation; dialog systems; speech-based … of interactive generation of textual and visual works; vision-based face, motion, and gesture capture; scene understanding; vision-based …

Multimodal Polynomial Fusion for Detecting Driver Distraction
Y Du, C Raman, AW Black, LP Morency… – arXiv preprint arXiv …, 2018 –
… Scene understanding is used along with information from a backward- facing camera (driver’s glances) to categorize driving behavior [13, 14] … There are voice interfaces installed in the vehicle, such as a spoken dialog system or personal assistant [16, 15] …

Object sequences: encoding categorical and spatial information for a yes/no visual question answering task
S Garg, R Srivastava – IET Computer Vision, 2018 –
… results. Such dialogue systems can also be modelled as goal-based decision-making tasks. In … discussed. Multi-modal dialogue systems offer a much more closer view to what artificial general intelligence would be like. Humans …

Ontology for Research in Artificial Intelligence
A Teferi, T Beshah –
… o papers ? Computer Vision [7, 8] • Object Recognition o papers • Scene Understanding o papers • Human Activity Recognition o papers • Active Vision … o papers • Context Modeling o papers • Dialog Systems o papers ? Robotics: subclasses [7, 8] • Mechanism design, sensors …

Interpreting Decision-Making in Interactive Visual Dialogue
U Sharma – 2018 –
… Abstract Dialogue systems that involve long-term planning can strongly benefit from a high-level notion of dialogue strategy and can avoid making poor … The visually-grounded dialogue task models a scene-understanding setup with dialogue as the principal method of discovery …

Image inspired poetry generation in xiaoice
WF Cheng, CC Wu, R Song, J Fu, X Xie… – arXiv preprint arXiv …, 2018 –
Page 1. Image Inspired Poetry Generation in XiaoIce ? Wen-Feng Cheng1,2, Chao-Chung Wu2, Ruihua Song1, Jianlong Fu1, Xing Xie1, Jian-Yun Nie3 1Microsoft, 2National Taiwan University, 3University of Montreal {wencheng …

Affective Processing Guides Behavior and Emotions Communicate Feelings: Towards a Guideline for the NeuroIS Community
P Walla – Information Systems and Neuroscience, 2018 – Springer
… development system by interacting with human EEG and natural scene understanding. Cogn. Syst. Res. 14(1), 37–49 (2012)CrossRefGoogle Scholar. 6. Callejas, Z., López-Cózar, R.: Influence of contextual information in emotion annotation for spoken dialogue systems …

Recursive Visual Attention in Visual Dialog
Y Niu, H Zhang, M Zhang, J Zhang, Z Lu… – arXiv preprint arXiv …, 2018 –
… Co-reference resolution has been used to improve visual comprehension in many tasks, such as visual grounding [13], action recognition [25, 26] and scene understanding [16]. According to [9], 98% of dialogs and 38% of questions in VisDial dataset have at least one pronoun …

Object referring in videos with language and human gaze
A Balajee Vasudevan, D Dai… – Proceedings of the …, 2018 –
Page 1. Object Referring in Videos with Language and Human Gaze Arun Balajee Vasudevan1, Dengxin Dai1, Luc Van Gool1,2 ETH Zurich1 KU Leuven 2 {arunv,dai,vangool} ch Abstract We investigate the problem of object referring (OR) ie …

Open-domain neural conversational agents: The step towards artificial general intelligence
S Arsovski, S Wong, AD Cheok – International Journal of …, 2018 –
… to pass the Turing Test and move AI research towards full AGI, more research effort towards open- domain dialogue system are presented … used for a variety of problems that can benefit from structure learning and exploitation, such as rich scene understanding in Reinforcement …

Engagement Recognition based on Multimodal Behaviors for Human-Robot Dialogue
K Inoue – 2018 –
… On the other hand, for spoken dialogue systems, the advantage of making use of robots is to be able to utilize multimodality … SIG [49] was designed to integrate audio and visual information for active audio-scene understanding that consists of tasks such as sound …

A Survey of Knowledge Representation and Retrieval for Learning in Service Robotics
D Paulius, Y Sun – arXiv preprint arXiv:1807.02192, 2018 –
… Specifically, we look at three broad categories involved in task representation and retrieval for robotics: 1) activity recognition from demonstrations, 2) scene understanding and interpretation, and 3) task representation in robotics – datasets and networks …

Classification of Things in DBpedia using Deep Neural Networks
R Parundekar – arXiv preprint arXiv:1802.02528, 2018 –
… standardized knowledge representation languages and services. Semantic Graphs are also used in other domains like Spoken Dialog Systems [3], Social Networks [4], Scene Understanding [3], Virtual & Augmented Reality [5], etc …

Tracking gaze and visual focus of attention of people involved in social interaction
B Massé, S Ba, R Horaud – IEEE transactions on pattern …, 2018 –
Page 1. Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction Benoit Mass e , Sil eye Ba, and Radu Horaud Abstract—The visual focus of attention (VFOA) has been recognized as a prominent conversational cue …

Review of State-of-the-Art in Deep Learning Artificial Intelligence
VV Shakirov, KP Solovyeva… – Optical Memory and …, 2018 – Springer
Page 1. 65 ISSN 1060-992X, Optical Memory and Neural Networks, 2018, Vol. 27, No. 2, pp. 65–80. © Allerton Press, Inc., 2018. Review of State-of-the-Art in Deep Learning Artificial Intelligence VV Shakirova, b, KP Solovyevaa …

Driver Behavior and Environment Interaction Modeling for Intelligent Vehicle Advancements
Y Zheng – 2018 –
… Finally, a voice-based interface between the driver and vehicle is simulated, and natural language processing tasks are investigated in the design of a navigation dialogue system. The accuracy for intent detection (ie, classify …

User-Adaptive Interaction in Social Robots: A Survey Focusing on Non-physical Interaction
GS Martins, L Santos, J Dias – International Journal of Social Robotics, 2018 – Springer
Page 1. International Journal of Social Robotics 4 SURVEY User-Adaptive Interaction in Social Robots: A Survey Focusing on Non-physical Interaction Gonçalo S. Martins1 · Luís Santos1 · Jorge Dias1,2 …

Gaze Direction in the context of Social Human-Robot Interaction
B Massé – 2018 –
Page 1. HAL Id: tel-01936821 Submitted on 27 Nov 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …

Neural Task Planning with And-Or Graph Representations
T Chen, R Chen, L Nie, X Luo, X Liu… – IEEE Transactions on …, 2018 –
… In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach. Index Terms—Scene understanding, Task planning, Action prediction, Recurrent neural network …

Semantic Mapping for Autonomous Robots in Urban Environments
C Landsiedel – 2018 –
… Developing building blocks for environment representations satisfying these requirements is one of the main aims of this thesis. Spoken Language Dialogue System and Action Planning The IURO robot’s main source of route information is the interaction with pedestrians …

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
A Gatt, E Krahmer – Journal of Artificial Intelligence Research, 2018 –
… For example, the generation of spoken utterances in dialogue systems (eg, Walker, Stent, Mairesse, & Prasad, 2007; Rieser & Lemon, 2009; Dethlefs, 2014) is another applica- tion of nlg, but typically it is closely related to dialogue management, so that management and …

Learning socio-communicative behaviors of a humanoid robot by demonstration
DC Nguyen – 2018 –
Page 1. HAL Id: tel-01962544 Submitted on 20 Dec 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not …

Image Processing Technologies
Page 1. Page 2. Image Processing Technologies Algorithms, Sensors, and Applications EDITED BY KlYOHARU AlZAWA University of Tokyo Tokyo, Japan KATSUHIKO SAKAUE Intelligent Systems Institute Tsukuba, Japan YASUHITO SUENAGA Nagoya University …

A Survey of Multi-View Representation Learning
Y Li, M Yang, ZM Zhang – IEEE Transactions on Knowledge …, 2018 –
Page 1. 1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See publications_standards/publications/rights/index.html for more information. This …

Reserve Calendar Time for Every Project
B GE, Y LIGHT – What is the Next Ceneration Internet initiative?, 2018 –
Page 54. 52 COMMUNICATIONS OF THE ACM| DECEMBER 2018| VOL. 61| NO. 12 HAVE YOU EVER come into work, sat down at your computer to begin a project, opened your editor, and then just stared at the screen? This …

Contextual Recurrent Level Set Networks and Recurrent Residual Networks for Semantic Labeling
NTH Le – 2018 –
… The main focus of this thesis is to investigate robust approaches that can tackle the challenging semantic labeling tasks including semantic instance segmentation and scene understanding … segmentation and scene understanding on these databases …

Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms
Y Bando, K Itoyama, M Konyo… – … on Audio, Speech …, 2018 –
Page 1. 2329-9290 (c) 2017 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See publications_standards/publications/rights/index.html for more information. This …