Notes:
Visual question answering (VQA) is a type of artificial intelligence (AI) that involves the use of a computer system to generate a natural language response to a question based on an input image or a sequence of images. In other words, a VQA system is able to “see” an image and then provide an appropriate written response to a question about the contents of the image.
VQA systems are typically trained using large datasets of images and their associated questions and answers. The system is then able to learn the relationships between different visual features and the types of questions and answers that are appropriate for them.
VQA systems have a wide range of potential applications, including image and video captioning, image and video search, and image-based question answering in educational contexts. They can also be used to provide additional context and information about images or videos for people with visual impairments or other disabilities.
- Crowd agents are computer-generated characters that are designed to interact with real people in online environments, such as social media platforms or virtual worlds. Crowd agents are often used to perform tasks such as moderating online conversations, providing customer service, or conducting market research.
- Image question answering (also known as visual question answering) is a type of artificial intelligence (AI) that involves the use of a computer system to generate a natural language response to a question based on an input image or a sequence of images.
- Image-based question answering is a type of artificial intelligence (AI) that involves the use of a computer system to generate a natural language response to a question based on an input image or a sequence of images.
- Multimedia question answering is a type of artificial intelligence (AI) that involves the use of a computer system to generate a natural language response to a question based on multiple types of input media, such as images, videos, and audio files.
- Photo-based question answering is a type of artificial intelligence (AI) that involves the use of a computer system to generate a natural language response to a question based on an input photo or a sequence of photos. This type of AI is similar to image-based question answering, but specifically refers to the use of photos as the input media.
Resources:
- vizwiz.org .. allows blind users to receive quick answers to questions about their surroundings
See also:
Natural Language Image Recognition | Question Answering Systems | Scene Understanding & Natural Language 2013 | Scene Understanding & Natural Language 2014 | Text-to-Image Systems | TTSCS (Text-to-scene Conversion Systems)
Photo-based question answering T Yeh, JJ Lee, T Darrell – Proceedings of the 16th ACM international …, 2008 – dl.acm.org Abstract Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo- … Cited by 51 Related articles All 6 versions
Image Question Answering: A Visual Semantic Embedding Model and a New Dataset M Ren, R Kiros, R Zemel – arXiv preprint arXiv:1505.02074, 2015 – arxiv.org Abstract: This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use recurrent neural networks and visual semantic embeddings without intermediate stages such as object detection … Cited by 3 Related articles All 2 versions
VQA: Visual Question Answering S Antol, A Agrawal, J Lu, M Mitchell, D Batra… – arXiv preprint arXiv: …, 2015 – arxiv.org Abstract: We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring many real-world scenarios, such … Cited by 7 Related articles All 3 versions
Building a Large-scale Multimodal Knowledge Base for Visual Question Answering Y Zhu, C Zhang, C Ré, L Fei-Fei – arXiv preprint arXiv:1507.05670, 2015 – arxiv.org Abstract: The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, a lack of common sense knowledge and the insufficiencies of joint reasoning still leave a …
Increasing the bandwidth of crowdsourced visual question answering to better support blind users WS Lasecki, Y Zhong, JP Bigham – Proceedings of the 16th international …, 2014 – dl.acm.org Abstract Many of the visual questions that blind people ask cannot be easily answered with a single image or a short response, especially when questions are of an exploratory nature, eg what is in this area, or what tools are available on this work bench? We introduce … Related articles All 6 versions
Using real-time feedback to improve visual question answering Y Zhong, P Thiha, G He, W Lasecki… – CHI’12 Extended Abstracts …, 2012 – dl.acm.org Abstract Technology holds great promise for improving the everyday lives of people with disabilities; however, automated systems are prone to errors and cannot handle many real- world tasks. VizWiz, a system for answering visual questions for blind users, has shown … Related articles All 2 versions
An Ontology-Driven Visual Question-Answering Framework G Besbes, H Baazaoui-Zghal… – … Visualisation (iV), 2015 …, 2015 – ieeexplore.ieee.org Abstract—Question Answering systems aim at providing answers to natural language questions and provide a solution to the problem of response accuracy. This paper describes a visual QA framework based on ontolgoies, that relies on two main components: question …
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering H Gao, J Mao, J Zhou, Z Huang, L Wang… – arXiv preprint arXiv: …, 2015 – arxiv.org Abstract: In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long-Short Term Memory (LSTM) to extract the … Cited by 1 Related articles All 2 versions
Vision and language: from visual question answering to domain-invariant representations LE Castrejón Subira – 2015 – upcommons.upc.edu [ANGLÈS] The field of computer vision has radically evolved in the last few years due to the success of deep artificial neural networks. Current models have achieved remarkable performance in hallmark vision tasks such as object and scene recognition, and scientists …
Image-Based Question Answering with Visual Semantic Embeddings M Ren – 2015 – cs.utoronto.ca Abstract Computers understanding complex objects in an image and interacting with human through natural language are two open areas of research in computer vision and computational linguistics. A three-year-old child can describe what he/she sees and …
Multimedia questions and answering using web data mining DB Bhaskar, DK Singh – Information Communication and …, 2014 – ieeexplore.ieee.org … Tom Yeh, John J. Lee and Trevor Darrell were the first to present image based QA [7]. They describe a photo based question answering which is a useful way of finding information about physical objects. They develop a three layer system architecture for a photo based QA. … Related articles
Harvesting visual concepts for image search with complex queries L Nie, S Yan, M Wang, R Hong, TS Chua – Proceedings of the 20th ACM …, 2012 – dl.acm.org … Based on the pro- posed scheme, we introduce two applications: photo-based question answering and textual news visualization. … t1 i1 tn in t3 i3 t2 i2 t4 i4 Concept Detector Photo-based Question Answering Textual News Visualization Other Potential Applications … Cited by 28 Related articles All 7 versions
Flower information retrieval using color feature and location-based system W Premchaiswadi, N Premchaiswadi – Proceedings of the 2nd …, 2011 – wseas.us … Photo- based question answering proposed by Tom Yeh et al. … 113-117, 2009. [17] http://www.hwildflower.com/ [18] http://www.onlineflowersearch.com [19] http://www.netvibes.com/ [20] T. Yeh, JJ Lee, T. Darrell, “Photo-based Question Answering”, ACM, MM’08, October , pp. … Related articles All 2 versions
Question answering from structured knowledge sources A Frank, HU Krieger, F Xu, H Uszkoreit… – Journal of Applied …, 2007 – Elsevier We present an implemented approach for domain-restricted question answering from structured knowledge sources, based on robust semantic analysis in a hybrid NLP. Cited by 95 Related articles All 5 versions
On Available Corpora for Empirical Methods in Vision & Language F Ferraro, N Mostafazadeh, TH Huang… – arXiv preprint arXiv: …, 2015 – arxiv.org … Visual Question Answering (VQA) Dataset (Antol et al., 2015) is created for the task … Toronto COCO-QA Dataset (CQA) (Ren et al., 2015) is also a visual question answering dataset, where the questions are automatically generated from image captions of MS COCO dataset. …
Mobile Q&A: beyond text-only Q&A and privacy concerns U Lee, E Yi, M Ko – Proceedings of CHI, 2013 – mslab.kaist.ac.kr … Bigham et al. [1] proposed VizWiz, a talking application for mobile phones that supports photo-based question- answering for people with visual impairments. … In Proc. of CHI (2008). 10. Yeh, T., Lee, JJ, Darrell, T., Photo-based Question Answering, In Proc. of MM (2008). Cited by 4 Related articles All 3 versions
Question answering based on pervasive agent ontology and Semantic Web Q Guo, M Zhang – Knowledge-Based Systems, 2009 – Elsevier Semantic Web technologies bring new benefits to knowledge-based question answering system. Especially, ontology is becoming the pivotal methodology to represent. Cited by 26 Related articles All 3 versions
Breaking Microsoft’s CAPTCHA CHBLP Karthik, RA Recasens – 2015 – courses.csail.mit.edu … To close the loop, we propose some possible improvements to the Microsoft CAPTCHA to avoid the attacks presented in section 3 and section 4. In particular, we use a recent problem introduced to the computer vision community, Visual Question- Answering [8], to propose a …
Getting fast, free, and anonymous answers to questions asked by people with visual impairments E Brady – ACM SIGACCESS Accessibility and Computing, 2015 – dl.acm.org … VizWiz To examine how crowdsourcing could be useful for visual question answering, Bigham et al. … By analyzing the questions asked with VizWiz Social, we are able to draw conclusions about how to design human-powered access tools for visual question answering. …
Question answering and database querying: Bridging the gap with generalized quantification A Badia – Journal of Applied Logic, 2007 – Elsevier Even though Questions Answering and Database Querying have very different goals and frameworks, collaboration between the two fields could be mutually beneficia. Cited by 7 Related articles All 4 versions
Learning to Answer Questions From Image using Convolutional Neural Network L Ma, Z Lu, H Li – arXiv preprint arXiv:1506.00333, 2015 – arxiv.org … question pair. question answering (VQA) (Antol et al., 2015) or image question answering (QA) (Malinowski et al., 2014a; Malinowski et al., 2014b; Malinowski et al., 2015a; Malinowski et al., 2015b; Ren et al., 2015). The image …
A Survey of Current Datasets for Vision and Language Research F Ferraro, N Mostafazadeh, THK Huang… – anthology.aclweb.org … 360,001 MadLib question and answers. • Visual Question Answering (VQA) Dataset … Toronto COCO-QA Dataset (CQA) (Ren et al., 2015) is also a visual question answering dataset, where the questions are automatically generated from image captions of MS COCO dataset. …
Question answering using statistical language modelling MH Heie, EWD Whittaker, S Furui – Computer Speech & Language, 2012 – Elsevier In this paper we present a statistical approach to question answering (QA). Our motivation is to build robust systems for many languages without the need for hi. Cited by 6 Related articles All 6 versions
An exploration of space-time constraints on contextual information in image-based testing interfaces U Karadkar, M Nordt, R Furuta, C Lee… – Research and Advanced …, 2006 – Springer … a video and image corpus. In particular, we are investigating the role of image layouts and the contextual information embodied by these layouts in image-based question-answering tasks. Towards this purpose, we conducted … Cited by 4 Related articles All 13 versions
Web image interpretation: semi-supervised mining annotated words F Wu, D Xia, Y Zhuang, H Zhang… – Multimedia and Expo, …, 2009 – ieeexplore.ieee.org … distinctive visual attributes. This paper intends to interpret images from the accurate annotated keywords. The similar work with our idea is text- to-image synthesis [11] and photo-based question answering [12]. Text-to-image … Cited by 2 Related articles All 4 versions
Question answering for Biology M Neves, U Leser – Methods, 2015 – Elsevier Biologists often pose queries to search engines and biological databases to obtain answers related to ongoing experiments. This is known to be a time consuming,. Cited by 2 Related articles All 4 versions
Image and Text Fusion for Context-aware Recommendation K Takada, M Iwasawa, M Kaneko… – … and Internet Based …, 2012 – ieeexplore.ieee.org … CONCEPTUAL IDEA OF CONTEXT-AWARE QUESTION ANSWERING SYSTEM In order to evaluate and verify the image and text fusion type platform, a context-aware(-image)-question- answering-system was developed. Fig. … Cited by 1 Related articles All 5 versions
A Speech-Enabled Intelligent Agent for Pedestrian Navigation and Tourist Information S Janarthanam, O Lemon, P Bartie, T Dalmas… – Intelligent Virtual Agents: …, 2013 – Springer … In this paper, we present the architecture and features of our latest system, ex- tended from an earlier version which was built and evaluated with real users. Keywords: mobile, speech, dialogue, geographical, visual, question- answering, GIS. … All 2 versions
Question answering with a conceptual framework for knowledge-based system development “Node of Knowledge” M Pavli?, ZD Han, A Jakupovi? – Expert Systems with Applications, 2015 – Elsevier The paper describes the development of a system for receiving questions from users and providing answers, which is a part of a larger knowledge-based (KB) syste. Related articles All 3 versions
Interactive search in image retrieval: a survey B Thomee, MS Lew – International Journal of Multimedia Information …, 2012 – Springer … QA with media information. In: Proceedings of ACM conference on research and development in information retrieval, pp 695–704; Yeh T, Lee J, Darrell T (2008) Photo-based question answering. In: Proceedings of ACM international … Cited by 36 Related articles All 6 versions
Mobile image search for tourist information using ACCC algorithm W Premchaiswadi, A Tungkatsathan… – Personal Indoor and …, 2010 – ieeexplore.ieee.org … Tom Yeh [13] presented Photo-based question answering. It is a useful way of finding information about physical objects. … 387 – 391. [13] T. Yeh, JJ Lee, T. Darrell, “Photo-based Question Answering,” ACM, MM’08, October , 2008, pp. 26–31. … Cited by 2 Related articles All 2 versions
Words and Pictures: An HCI Perspective TJ Siddiqui, US Tiwary – Proceedings of the First International Conference …, 2009 – Springer … [20] took these ideas further to include an image component in the question itself. They proposed three-layer architecture for photo-based question answering system as shown in Fig. … Yeh, T., Lee, JJ, Darell, T.: Photo-based Question Answering, ACM Multimedia (2008) 21. … Cited by 1 Related articles All 6 versions
VizWiz:: LocateIt-enabling blind people to locate objects in their environment JP Bigham, C Jayan, A Miller, B White… – Computer Vision and …, 2010 – ieeexplore.ieee.org … Photo- based Question Answering enables users to ask ques- tions that reference an included photograph, and tack- les the very difficult problems of automatic computer vision and question answering [41]. … Photo- based question answering. In Proc. of the Intl. Conf. … Cited by 36 Related articles All 5 versions
An Approach to Knowledge Discovery by Data Harvesting B Srinivas – IJSEAT, 2014 – ijseat.com … Rep. [7] H. Yang, T.-S.Chua, S. Wang, and C.-K.Koh, “Structured use of external knowledge for event- based open domain question answering,” in Proc.ACM Int. SIGIR Conf., 2003. [8] T. Yeh, JJ Lee, and T. Darrell, “Photo-based question answering,” in Proc. ACM Int. Conf. … Related articles All 2 versions
VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases F Sadeghi, SK Divvala… – Proceedings of the IEEE …, 2015 – viske.cs.washington.edu … Model [2] and other baselines. Visual Question Answering: The visual knowledge that we have gathered using our visual verification method can help improve the reasoning within question-answering systems. In this paper, we … Cited by 4 All 4 versions
Modeling betweenness for question answering B Marshall – Proceedings of the third workshop on Exploiting …, 2010 – dl.acm.org … Flickr tag recommendation based on collective knowledge. In Proceedings of ACM WWW, pages 327–226, 2008. [10] T. Yeh, JJ Lee, and T. Darrell. Photo-based question answering. In Proceedings of the ACM international conference on Multimedia, pages 389–398, 2008. 34 Cited by 3 Related articles All 3 versions
Using Conventional MMQA To Automatically Annotate Media Entities KVV SubbaRao, VC Kumar – IJRCCT, 2014 – ijrcct.org … Rep [7] H. Yang, T.-S. Chua, S. Wang, and C.-K. Koh, “Structured use of external knowledge for event- based open domain question answering,” in Proc. ACM Int. SIGIR Conf., 2003. [8] T. Yeh, JJ Lee, and T. Darrell, “Photo-based question answering,” in Proc. ACM Int. Conf. … Related articles All 2 versions
Orchestrating Learning in a one-to-one Technology Classroom J Niramitranon, M Sharples, C Greenhalgh – New Science of Learning, 2010 – Springer Logo Springer. Search Options: … Cited by 14 Related articles All 12 versions
Powering interactive intelligent systems with the crowd WS Lasecki – Proceedings of the adjunct publication of the 27th …, 2014 – dl.acm.org … about the world around them. Conceptually, this work extends VizWiz [2], a system for nearly real-time (within roughly 30 seconds to 1 minute) visual question answering from photographs. In VizWiz, the single-photograph requirement … Related articles All 4 versions
Architecting Real-Time Crowd-Powered Systems WS Lasecki, C Homan, JP Bigham – Human Computation, 2014 – repository.cmu.edu … Chorus:View (Lasecki et al., 2013b) used the Chorus conversa- tional platform to extend VizWiz into a conversational visual question-answering assistant capable of giving real-time feedback to users regarding how to accurately frame their image. … Related articles All 4 versions
Multimedia Answer Generation for Community Question Answering Engine: A Review SD Ingale, RK Sorde, RR Deshmukh, SN Deshmukh – researchgate.net … Page 5. T.-S. Chua, “Question answering over community contributed web video,” IEEE Multimedia,vol. 17, no. 4, pp. 46–57, 2010. [11] T. Yeh, JJ Lee, and T. Darrell, “Photo-based question answering,” inProc. ACM Int. Conf. Multimedia, 2008. …
Image Metadata as a Means to Gather Image Relevant Information R Karlsen, RBA Hansen – Norsk informatikkonferanse, 2012 – tapironline.no … Our work is also related to image-based question answering [12] in the sense that we assume an implicit query: ”Give me information related to this image”, where the relationship can be with respect to both location and image content. … Related articles All 2 versions
Image interpretation: mining the visible and syntactic correlation of annotated words D Xia, F Wu, W Liu, H Zhang – Journal of Zhejiang University SCIENCE A, 2009 – Springer … attributes. This paper aims to interpret images from the accurately annotated keywords. The work similar to our idea is text-to-image synthesis (Zhu et al., 2007b) and photo-based question answering (Yeh et al., 2008). Text-to … Related articles All 8 versions
FashionAsk: A Multimedia based Question-Answering System W Zhang, L Pang, CW Ngo – vireo.cs.cityu.edu.hk … 2003. [8] K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In SIGIR, 2009. [9] T. Yeh, JJ Lee, and T. Darrell. Photo-based question answering. In ACM MM, 2008. [10] W. Zhang and C.-W. Ngo. … Related articles
Pororobot: A Deep Learning Robot That Plays Video Q&A Games KM Kim, CJ Nan, JW Ha, YJ Heo, BT Zhang – 2015 – researchgate.net … 70(0):53–64. Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W. 2015. Are You Talking to a Machine? Dataset and Methods for Multi- lingual Image Question Answering. arXiv preprint arXiv:1505.05612. Girshick, R., Donahue, J., Darrell, T., Malik, J. 2014. …
Document information retrieval S Klink, K Kise, A Dengel, M Junker, S Agne – Digital Document …, 2007 – Springer … In following section we propose three document retrieval methods based on the techniques described above: a collaborative information retrieval ap- proach, a question-answering system and the recently developed document image question-answering system called IQAS. … Cited by 7 Related articles All 5 versions
The InfoAlbum image centric information collection R Karlsen, B Jakobsen – Proceedings of the International Conference on …, 2011 – dl.acm.org … Our work is related to image-based question answering [11] in the sense that we assume an implicit query: ”Give me information related to this image”. The relationship between image and information can be with respect to i) location, and ii) content of image. … Cited by 2 Related articles
Query expansion for hash-based image object retrieval YH Kuo, KT Chen, CH Chiang, WH Hsu – Proceedings of the 17th ACM …, 2009 – dl.acm.org … Figure 7(c)(d)). Such techniques also motivate many promising applications such as exploring photo collections in 3D [25], photo-based question answering [31], video advertising by image matching [21], annotation by search [27], etc. … Cited by 54 Related articles All 6 versions
Video reference: question answering on YouTube G Li, Z Ming, H Li, TS Chua – Proceedings of the 17th ACM international …, 2009 – dl.acm.org … In SIGIR, 2009. [12] H. Yang, L. Chaisorn, Y. Zhao, S.-Y. Neo, and T.-S. Chua. VideoQA: question answering on news video. In ACM Mulitimedia, 2003. [13] T. Yeh, JJ Lee, and T. Darrell. Photo-based question answering. In ACM Mulitimedia, 2008. 776 Cited by 18 Related articles All 3 versions
A robust multivariate reranking algorithm for Question Answering enrichment Y Liu, J Liu, D Wang, J Cheng – Image Processing (ICIP), 2012 …, 2012 – ieeexplore.ieee.org … S. ACKNOWLEDGEMENTS This work was supported in part by National Natural Science Foundation of China (Grant No. 61170127 and 60975010). 6. REFERENCES [I] T Yeh, JJ Lee, and T Darrell, “Photo-based question answering,” in ACM Multimedia, 2008. [2] TS. … Related articles
Uncover What You See in Your Images: The InfoAlbum approach. R Karlsen, B Jakobsen, RBA Hansen – IJCSA, 2012 – tmrfindia.org … Our work is related to image-based question answering [Yeh (2008)] in the sense mhttp://www.wikitude.org/ nhttp://www.layar.com/ ohttp://linkeddata.org/ phttp://esw.w3.org/SweoIG/ TaskForces/CommunityProjects/LinkingOpenData Page 11. 60 Karlsen, Jakobsen, Hansen … Cited by 1 Related articles
Reports on the 2015 AAAI Workshop Series. SV Albrecht, JC Beck, DL Buckeridge, A Botea… – AI …, 2015 – search.ebscohost.com … common- sense knowledge is critical. (2) Integrated language and image question answering: The focus is on the compre- hension of novel materials, such as videos, texts, photos, and podcasts. (3) Task-based perception and …
Interactive inquiry for object of interest in video playback by motion-augmented graph cut PN Tseng, YL Lin, WH Hsu – Proceedings of the international conference …, 2010 – dl.acm.org … Webified video: media conversion from TV program to web content and their integrated viewing method. In WWW, 2005. [2] T. Yeh, JJ Lee, and T. Darrell. Photo-based question answering. In ACM Multimedia, 2008. [3] Y. Boykov and M.-P. Jolly. … Cited by 1 Related articles
Content-based visual search learned from social media X Li – 2012 – dare.uva.nl … IEEE Trans. Multimedia, 2011. in press. Page 11. 130 Bibliography [144] T. Yeh, J. Lee, and T. Darrell. Photo-based question answering. In ACM Multimedia, 2008. [145] H. Yu, M. Li, H.-J. Zhang, and J. Feng. Color texture moment for content-based image retrieval. In ICIP, 2002. … Related articles All 2 versions
Exploiting of flickr note and its applications for social image sharing and search JW Jeong, HK Hong, DH Lee – Multimedia (ISM), 2011 IEEE …, 2011 – ieeexplore.ieee.org … User response prediction may provide meaningful information to image uploaders and service providers. 5) Visual question answering: 10% of Flickr note contain question & answer contents about the object occurred in the images. … Cited by 1 Related articles All 6 versions
Interactive crowds: real-time crowdsourcing and crowd agents WS Lasecki, JP Bigham – Handbook of Human Computation, 2013 – Springer … information from prior interactions. Chorus:View. Chorus:View (Zhong et al. 2012 ) combines the ability to hold conversations with the crowd with a visual question answering service similar to VizWiz. By using streaming video … Cited by 2 Related articles All 5 versions
A Novel Multimedia Question Answering Approach for Multimedia Answers By Yield Web Information M krishna Goutham, SKA Nabi – ijmetmr.com … 37.Tom Yeh , John J. Lee , Trevor Darrell, Photo- based question answering, Proceedings of the 16th ACM international conference on Multimedia, Octo- ber 26-31, 2008, Vancouver, British Columbia, Canada [doi>10.1145/1459359.1459412] … Related articles
Building Multi-Modal Relational Graphs for Multimedia Retrieval JR Shieh, CY Lin, SX Wang, JL Wu – … Applications and Processing, 2013 – books.google.com … (2004), they proposed techniques for recognizing locations by Web searching. Users can optionally provide texts to aid the search. In addition, was also developed a photo-based question answering system to help people finding useful information given an image. … Cited by 1 Related articles All 5 versions
Using Thought-Provoking Children’s Questions to Drive Artificial Intelligence Research ET Mueller, H Minsky – arXiv preprint arXiv:1508.06924, 2015 – arxiv.org … BrainPlay material. References [Antol et al. 2015] Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra, D.; Zitnick, CL; and Parikh, D. 2015. VQA: Visual question answering. CoRR abs/1505.00468. [Clark 2015] Clark, P. 2015. Elementary …
From text question-answering to multimedia QA on web-scale media resources TS Chua, R Hong, G Li, J Tang – Proceedings of the First ACM workshop …, 2009 – dl.acm.org … [21] TRECVID: a video evaluation forum organized in conjunction with TREC. See http://trecvid.nist.org/. [22] T. Yeh, JJ Lee, T. Darrell. “Photo-based Question Answering”, ACM Multimedia, 2008. [23] EM Voorhees. 2001. Overview of the TREC 2001 Question Answering … Cited by 16 Related articles All 3 versions
Mental Images of Text: Learning Document Similarity using Web Photos G Bertasius, L Torresani – seas.upenn.edu … ACM. [14] Tom Yeh, John J Lee, and Trevor Darrell. Photo-based question answering. In Proceedings of the 16th ACM international conference on Multimedia, MM ’08, pages 389 – 398, New York, NY, USA, 2008/// 2008. ACM, ACM. … Related articles
Exploring large scale data for multimedia QA: an initial study R Hong, G Li, L Nie, J Tang, TS Chua – proceedings of the ACM …, 2010 – dl.acm.org … 2009. [15] T. Yeh, JJ Lee, T. Darrell. Photo-based Question Answering. Proc. Of the 16h ACM international Conference on Multimedia. Vancouver, Canada. 2008. [16] R. Hong, J. Tang, HK Tan, S. Yan, C. -W. Ngo, T. -S. Chua. Event Driven Summarization for Web Videos. … Cited by 15 Related articles All 4 versions
Cross domain search by exploiting wikipedia C Liu, S Wu, S Jiang, AKH Tung – Data Engineering (ICDE), …, 2012 – ieeexplore.ieee.org Page 1. Cross Domain Search by Exploiting Wikipedia Chen Liu #1, Sai Wu #2, Shouxu Jiang *3, Anthony KH Tung #4 # School of Computing, National University of Singapore 13 Computing Drive, 117417, Singapore {1 liuchen … Cited by 8 Related articles All 9 versions
Multimedia question answering C Tat-Seng, R Hong, J Tang – Scholarpedia, 2010 – scholarpedia.org … Ye, S., Chua, T.-S., and Lu, J. (2009). Summarizing definition from Wikipedia. Proc. ACL. Yeh, T., Lee, JJ, and Darrell, T. (2008). Photo-based question answering. Proc. ACM Multimedia. Zha, ZJ, Yang, L., Mei, T., Wang, M., Wang, Z. (2009). Visual Query Suggestion. Proc. … Cited by 1 Related articles All 3 versions
VizWiz: nearly real-time answers to visual questions JP Bigham, C Jayant, H Ji, G Little, A Miller… – Proceedings of the …, 2010 – dl.acm.org Page 1. VizWiz: Nearly Real-time Answers to Visual Questions Jeffrey P. Bigham†, Chandrika Jayant? , Hanjie Ji†, Greg Little§, Andrew Miller?, Robert C. Miller§, Robin Miller†, Aubrey Tatarowicz§, Brandyn White‡, Samuel White†, and Tom Yeh‡ … Cited by 286 Related articles All 7 versions
Describing Common Human Visual Actions in Images MR Ronchi, P Perona – arXiv preprint arXiv:1506.02203, 2015 – arxiv.org Page 1. Describing Common Human Visual Actions in Images Matteo Ruggero Ronchi Pietro Perona mronchi@caltech.edu perona@caltech.edu California Institute of Technology Which common human actions and interactions are recognizable in monocular still images? …
Learning Common Sense Through Visual Abstraction R Vedantam, X Lin, T Batra, CL Zitnick, D Parikh – 2015 – filebox.ece.vt.edu Page 1. Learning Common Sense Through Visual Abstraction Ramakrishna Vedantam1? Xiao Lin1? Tanmay Batra2† C. Lawrence Zitnick3 Devi Parikh1 1Virginia Tech 2Carnegie Mellon University 3Microsoft Research 1{vrama91 …
Accelerating Very Deep Convolutional Networks for Classification and Detection X Zhang, J Zou, K He, J Sun – arXiv preprint arXiv:1505.06798, 2015 – arxiv.org Page 1. 1 Accelerating Very Deep Convolutional Networks for Classification and Detection Xiangyu Zhang, Jianhua Zou, Kaiming He†, and Jian Sun Abstract—This paper aims to accelerate the test-time computation of convolutional … Related articles All 2 versions
Social microvolunteering: quick, free answers to visual questions from blind people E Brady – 2015 – urresearch.rochester.edu Page 1. Social Microvolunteering: Quick, Free Answers to Visual Questions from Blind People by Erin Brady Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Supervised by Professor Jeffrey P. Bigham Department of Computer Science …
Image retrieval: Research and use in the information explosion M Inoue – Progress in Informatics, 2009 – nii.ac.jp Page 1. Special issue: Leading ICT technologies in the Information Explosion Progress in Informatics, No. 6, pp.3–14, (2009) 3 Survey Paper Image retrieval: Research and use in the information explosion Masashi INOUE National Institute of Informatics … Cited by 12 Related articles All 8 versions
Advanced Community Question Answering Sites with Multimedia Answers A Sekhar, ST Shini, P Kumar – ijettjournal.org … [11] H. Yang, T.-S. Chua, S. Wang, and C.-K. Koh, “Structured use of external knowledge for event-based open domain question answering,” in Proc.ACM Int. SIGIR Conf., 2003 [12] T. Yeh,JJ Lee, and T. Darrell, ”Photo-based question answering,”in Proc. ACM Int. Conf. … Related articles
Tag relevance fusion for social image retrieval X Li – Multimedia Systems, 2014 – Springer Page 1. 1 3 DOI 10.1007/s00530-014-0430-9 Multimedia Systems SPECIAL ISSUE PAPER Tag relevance fusion for social image retrieval Xirong Li © Springer-Verlag Berlin Heidelberg 2014 answering [47], and photo-based advertisements [22], to name a few. … Related articles All 5 versions
Boosting image object retrieval and indexing by automatically discovered pseudo-objects KT Chen, KH Lin, YH Kuo, YL Wu, WH Hsu – Journal of Visual …, 2010 – Elsevier State-of-the-art object retrieval systems are mostly based on the bag-of-visual-words representation which encodes local appearance information of an image in a. Cited by 5 Related articles All 5 versions
RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users Y Zhong, WS Lasecki, E Brady… – Proceedings of the 33rd …, 2015 – cs.rochester.edu Page 1. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users 2 Yu Zhong1, Walter S. Lasecki1, Erin Brady1, Jeffrey P. Bigham1 , Computer Science, ROC HCI1 University of Rochester … Cited by 1 Related articles All 4 versions
Snap-and-ask: answering multimodal question by naming visual instance W Zhang, L Pang, CW Ngo – Proceedings of the 20th ACM international …, 2012 – dl.acm.org Page 1. Snap-and-Ask: Answering Multimodal Question by Naming Visual Instance ? Wei Zhang, Lei Pang, Chong-Wah Ngo Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong {wzhang34 … Cited by 11 Related articles All 3 versions
Beyond text qa: Multimedia answer generation by harvesting web information L Nie, M Wang, Y Gao, ZJ Zha… – … , IEEE Transactions on, 2013 – ieeexplore.ieee.org Page 1. Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. … Cited by 29 Related articles All 15 versions
[BOOK] Content-based visual search learned from social media X Li – 2012 – dare.uva.nl Page 1. Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.103390 File ID uvapub:103390 Filename Thesis Version unknown SOURCE (OR PART OF … Related articles All 5 versions
A novel strategy for recommending multimedia objects and its application in the cultural heritage domain M Albanese, A d’Acierno, V Moscato, F Persia… – 2011 – books.google.com Page 302. 274 Chapter 15 A Novel Strategy for Recommending Multimedia Objects and its Application in the Cultural Heritage Domain Massimiliano Albanese George Mason University, USA Antonio d’Acierno ISA, National … Cited by 4 Related articles All 8 versions
RFID-based interactive multimedia system for the children A Karime, MA Hossain, ASMM Rahman… – Multimedia Tools and …, 2012 – Springer Page 1. RFID-based interactive multimedia system for the children Ali Karime & M. Anwar Hossain & ASM Mahfujur Rahman & Wail Gueaieb & Jihad Mohamed Alja’am &Abdulmotaleb El Saddik Published online: 18 March 2011 © Springer Science+Business Media, LLC 2011 … Cited by 15 Related articles All 8 versions
Opinion Question Answering by Sentiment Clip Localization LEI PANG, CWAH NGO – vireo.cs.cityu.edu.hk Page 1. ? Opinion Question Answering by Sentiment Clip Localization LEI PANG, City University of Hong Kong CHONG-WAH NGO, City University of Hong Kong This paper considers multimedia question answering beyond factoid and how-to questions. …
Combining Multimodal External Resources For Event-Based News Video Retrieval And Question Answering NEOSHI YONG – 2008 – scholarbank.nus.edu.sg Page 1. COMBINING MULTIMODAL EXTERNAL RESOURCES FOR EVENT-BASED NEWS VIDEO RETRIEVAL AND QUESTION ANSWERING SHI-YONG NEO (B. COMP (HONORS), NATIONAL UNIVERSITY OF SINGAPORE) … Related articles All 2 versions