SQuAD (Stanford Question Answering Dataset)

Notes:

The Stanford Question Answering Dataset (SQuAD) is a widely used dataset for natural language processing (NLP) and machine learning research, specifically for reading comprehension tasks. It consists of a large collection of questions and answers about a range of topics, which are sourced from Wikipedia articles.

In SQuAD, each question is posed by a crowdworker on a specific Wikipedia article, and the answer to each question is a segment of text, or span, from the corresponding reading passage. The goal of the dataset is to test the ability of machine learning models to understand and answer questions about the content of a given passage of text.

SQuAD has been widely used in NLP research to evaluate and compare the performance of different machine learning models on reading comprehension tasks. It has also been used to develop and train machine learning models that can understand and answer questions about a wide range of topics and contexts. Overall, SQuAD has played a significant role in advancing the field of NLP and in developing more intelligent and capable machine learning systems.

Resources:

stanford-qa.com .. the stanford question answering dataset
toronto coco-qa dataset .. automatically generated from image captions

Wikipedia:

Question-focused dataset

See also:

Visual Question Answering

Squad: 100,000+ questions for machine comprehension of text P Rajpurkar, J Zhang, K Lopyrev, P Liang – arXiv preprint arXiv: …, 2016 – arxiv.org Page 1. SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar and Jian Zhang and Konstantin Lopyrev and Percy Liang 1pranavsr,zjian,klopyrev,pliangl@cs. stanford.edu Computer Science Department Stanford University Abstract … Cited by 7

Machine comprehension using match-lstm and answer pointer S Wang, J Jiang – arXiv preprint arXiv:1608.07905, 2016 – arxiv.org … Machine comprehension of text is an important problem in natural language pro- cessing. A recently released dataset, the Stanford Question Answering Dataset (SQuAD), offers a large number of real questions and their answers created by humans through crowdsourcing. … Cited by 4

Dynamic Coattention Networks For Question Answering C Xiong, V Zhong, R Socher – arXiv preprint arXiv:1611.01604, 2016 – arxiv.org … incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1. 1 INTRODUCTION Question answering …

End-to-End Reading Comprehension with Dynamic Answer Chunk Ranking Y Yu, W Zhang, K Hasan, M Yu, B Xiang… – arXiv preprint arXiv: …, 2016 – arxiv.org … The experiments on the Stanford Question Answering Dataset (SQuAD) (Rajpurkar et al. 2016), which contains a variety of human-generated factoid and non-factoid questions, have shown the effective- ness of above three contributions. Our paper is organized as follows. …

Words or Characters? Fine-grained Gating for Reading Comprehension Z Yang, B Dhingra, Y Yuan, J Hu, WW Cohen… – arXiv preprint arXiv: …, 2016 – arxiv.org … results. 4.2.2 SQUAD The Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset collected recently (Rajpurkar et al., 2016). It contains 23,215 paragraphs come from 536 Wikipedia articles. Unlike …