Open Information Extraction

Notes:

Open Information Extraction (OpenIE) is a natural language processing (NLP) technique that is used to extract structured information from unstructured text. OpenIE algorithms are designed to identify and extract relationships between entities in text, such as the subject and object of a sentence or the cause and effect of an event. The extracted information is typically represented in a structured format, such as a list of triples (subject, predicate, object), that can be easily processed and analyzed by other software systems.

OpenIE is often used as a preprocessing step in a wider NLP pipeline, where it can be used to extract relevant information from a large corpus of text and prepare it for further analysis. For example, OpenIE might be used to extract names, dates, and locations from a set of news articles, or to extract product details from a set of online product listings.

OpenIE algorithms are typically based on machine learning or rule-based approaches, and they can be trained on large datasets to improve their accuracy and performance. OpenIE systems are widely used in a variety of applications, including information retrieval, question answering, and text summarization.

Open Information Extraction (OpenIE) can be used to populate a knowledge base by extracting structured information from unstructured text and storing it in a structured format. A knowledge base is a collection of structured data that represents the facts, concepts, and relationships within a specific domain or subject area. Populating a knowledge base involves extracting relevant information from a variety of sources and storing it in a way that is easy to access and query.

OpenIE algorithms can be used to extract structured information from text and store it in a knowledge base in the form of triples (subject, predicate, object). For example, OpenIE might be used to extract the following triple from the sentence “Barack Obama was the 44th President of the United States”:

Subject: Barack Obama
Predicate: was the 44th President of
Object: the United States

This information could then be stored in the knowledge base as a fact, with Barack Obama as the subject, “was the 44th President of” as the predicate, and “the United States” as the object.

OpenIE can be particularly useful for populating a knowledge base with large amounts of information from unstructured sources, such as text documents, articles, and social media posts. It can also be used to extract specific types of information, such as names, dates, and locations, which can be useful for organizing and querying the knowledge base. Overall, OpenIE can be a powerful tool for building and maintaining large, comprehensive knowledge bases.

Resources:

ClausIE .. identifies and extracts relations and their arguments in natural language text
github.com/knowitall .. autonomous domain-independent systems that extract information from the Web
knowitall.github.io/ollie .. automatically identifies and extracts binary relationships from English sentences
openie.allenai.org .. project page for Allen Institute for Artificial Intelligence Oren Etzioni Open IE
knowitall.github.io/openie .. Open IE 4.0 is the successor to ReVerb and Ollie
relgrams.cs.washington.edu .. a rel-gram is a pair of open-domain relational tuples (T,T’)
reverb.cs.washington.edu .. automatically identifies and extracts binary relationships from English sentences
thewikimachine.fbk.eu .. automatically links the most relevant terms and entities in your documents

Wikipedia:

Open information extraction for the web M Banko, MJ Cafarella, S Soderland, M Broadhead… – IJCAI, 2007 – aaai.org Abstract Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (eg, extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to … Cited by 983 Related articles All 26 versions

Open information extraction from the web O Etzioni, M Banko, S Soderland, DS Weld – Communications of the ACM, 2008 – dl.acm.org december 2008| vol. 51| no. 12| communications of the acm 69 70 communications of the acm| december 2008| vol. 51| no. 12 review articles of MUC-3 and MUC-4 was Latin- American Terrorism; 2 and the task was to fill templates with information about specific … Cited by 314 Related articles All 18 versions

Identifying relations for open information extraction A Fader, S Soderland, O Etzioni – Proceedings of the Conference on …, 2011 – dl.acm.org Abstract Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-of-the-art Open IE systems is rife with uninformative and incoherent extractions. To … Cited by 381 Related articles All 17 versions

Open information extraction J Zhao, K Liu, G Zhou, L Cai – Journal of Chinese Information …, 2011 – en.cnki.com.cn The research on information extraction is being developed into open information extraction, ie extracting open categories of entities, relations and events from open domain text resources. The methods used are also transferred from pure statistical machine learning … Cited by 6 Related articles

Open information extraction using Wikipedia F Wu, DS Weld – Proceedings of the 48th Annual Meeting of the …, 2010 – dl.acm.org Abstract Information-extraction (IE) systems seek to distill semantic relations from natural- language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, … Cited by 261 Related articles All 22 versions

Textrunner: open information extraction on the web A Yates, M Cafarella, M Banko, O Etzioni… – Proceedings of Human …, 2007 – dl.acm.org Abstract Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open … Cited by 146 Related articles All 11 versions

Open Information Extraction: The Second Generation. O Etzioni, A Fader, J Christensen, S Soderland… – IJCAI, 2011 – cs.washington.edu How do we scale information extraction to the mas- sive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowl- edge from the Web. In 2007, we introduced the Open Information Ex- traction … Cited by 191 Related articles All 15 versions

Using wikipedia to bootstrap open information extraction DS Weld, R Hoffmann, F Wu – ACM SIGMOD Record, 2009 – dl.acm.org We often use ‘Data Management’to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage the information underlying this … Cited by 64 Related articles All 14 versions

Open language learning for information extraction M Schmitz, R Bart, S Soderland, O Etzioni – Proceedings of the 2012 …, 2012 – dl.acm.org … Abstract Open Information Extraction (IE) systems ex- tract relational tuples from text, without re- quiring a pre-specified vocabulary, by iden- tifying relation phrases and associated argu- ments in arbitrary sentences. However … Cited by 125 Related articles All 13 versions

ClausIE: clause-based open information extraction L Del Corro, R Gemulla – … of the 22nd international conference on World …, 2013 – dl.acm.org Abstract We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of“useful … Cited by 50 Related articles All 18 versions

Adapting open information extraction to domain-specific relations S Soderland, B Roof, B Qin, S Xu, O Etzioni – AI magazine, 2010 – aaai.org Abstract Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target … Cited by 37 Related articles All 9 versions

Kraken: N-ary facts in open information extraction A Akbik, A Löser – Proceedings of the Joint Workshop on Automatic …, 2012 – dl.acm.org Abstract Current techniques for Open Information Extraction (OIE) focus on the extraction of binary facts and suffer significant quality loss for the task of extracting higher order N-ary facts. This quality loss may not only affect the correctness, but also the completeness of an … Cited by 28 Related articles All 15 versions

Dependency-based open information extraction P Gamallo, M Garcia, S Fernández-Lanza – Proceedings of the Joint …, 2012 – dl.acm.org Abstract Building shallow semantic representations from text corpora is the first step to perform more complex tasks such as text entailment, enrichment of knowledge bases, or question answering. Open Information Extraction (OIE) is a recent unsupervised strategy … Cited by 30 Related articles All 10 versions

Semantic role labeling for open information extraction J Christensen, S Soderland, O Etzioni – … of the NAACL HLT 2010 First …, 2010 – dl.acm.org Abstract Open Information Extraction is a recent paradigm for machine reading from arbitrary text. In contrast to existing techniques, which have used only shallow syntactic features, we investigate the use of semantic features (semantic roles) for the task of Open IE. We … Cited by 27 Related articles All 22 versions

Integrating syntactic and semantic analysis into the open information extraction paradigm A Moro, R Navigli – Proceedings of the Twenty-Third international joint …, 2013 – dl.acm.org Abstract In this paper we present an approach aimed at enriching the Open Information Extraction paradigm with semantic relation ontologization by integrating syntactic and semantic features into its workflow. To achieve this goal, we combine deep syntactic … Cited by 17 Related articles All 8 versions

An analysis of open information extraction based on semantic role labeling J Christensen, S Soderland, O Etzioni – Proceedings of the sixth …, 2011 – dl.acm.org Abstract Open Information Extraction extracts relations from text without requiring a pre- specified domain or vocabulary. While existing techniques have used only shallow syntactic features, we investigate the use of semantic role labeling techniques for the task of Open … Cited by 16 Related articles All 10 versions

Open Information Extraction with Tree Kernels. Y Xu, MY Kim, K Quinn, R Goebel, D Barbosa – HLT-NAACL, 2013 – aclweb.org Abstract Traditional relation extraction seeks to identify pre-specified semantic relations within natural language text, while open Information Extraction (Open IE) takes a more general approach, and looks for a variety of relations without restriction to a fixed relation … Cited by 17 Related articles All 11 versions

Filtering and clustering relations for unsupervised information extraction in open domain W Wang, R Besançon, O Ferret, B Grau – Proceedings of the 20th ACM …, 2011 – dl.acm.org … 508, BP 133, 91403 Orsay, France ABSTRACT Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. … Cited by 24 Related articles All 4 versions

Open information extraction via contextual sentence decomposition H Bast, E Haussmann – Semantic Computing (ICSC), 2013 …, 2013 – ieeexplore.ieee.org Abstract—We show how contextual sentence decomposition (CSD), a technique originally developed for high-precision semantic search, can be used for open information extraction (OIE). Intuitively, CSD decomposes a sentence into the parts that semantically “belong … Cited by 8 Related articles All 7 versions

Exploiting semantic annotations for open information extraction: an experience in the biomedical domain V Nebot, R Berlanga – Knowledge and information Systems, 2014 – Springer Abstract The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, … Cited by 9 Related articles All 8 versions

Integrating Open and Closed Information Extraction: Challenges and First Steps. A Dutta, C Meilicke, M Niepert… – NLP- …, 2013 – publications.wim.uni-mannheim.de … In this paper, we introduce the integration of open information extraction projects with Wikipedia-based IE projects that maintain a logical schema, as an important challenge for the NLP, semantic web, and machine learning communities. … Cited by 9 Related articles All 7 versions

Joint Inference: a Statistical Approach for Open Information Extraction Y Liu, B Yang – Appl. Math, 2012 – amis.naturalspublishing.com Abstract: In recent decades, natural language processing has great progress. Better model of each sub-problem achieves 90% accuracy or better, such as part-of-speech tagging and phrase chunking. However, success in integrated, end-to-end natural language … Cited by 5 Related articles All 15 versions

LODIE: Linked Open Data for Web-scale Information Extraction. F Ciravegna, AL Gentile, Z Zhang – SWAIE, 2012 – ceur-ws.org … seed data. While these systems learn to extract predefined types of information based on (limited) training data, the Tex- tRunner [2] system proposes the “Open Information Extraction”, a new paradigm that 12 Page 19. 3 exploits … Cited by 10 Related articles All 5 versions

Coreference Resolution based on Probabilistic Graphical Model for Open Information Extraction. Y Liu, C Ouyang, B Yang – International Journal of …, 2012 – search.ebscohost.com Abstract Traditional Information Extraction (IE) methods require hand-tagged relation examples. Machine learning methods to coreference resolution that is sub-problem of Traditional Information Extraction also require human-tagged data and are supervised. … Cited by 4 Related articles All 2 versions

Improving open information extraction for informal web documents with ripple-down rules MH Kim, P Compton – … Management and Acquisition for Intelligent Systems, 2012 – Springer Abstract The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) … Cited by 4 Related articles All 5 versions

Open-domain Multi-Document summarization via information extraction: Challenges and prospects H Ji, B Favre, WP Lin, D Gillick, D Hakkani-Tur… – Multi-source, …, 2013 – Springer … The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. References. 1. Banko, M., Cafarella, MJ, Soderland, S., Etzioni, O.: Open information extraction from the web. … Cited by 9 Related articles All 8 versions

More informative open information extraction via simple inference H Bast, E Haussmann – Advances in Information Retrieval, 2014 – Springer Abstract Recent Open Information Extraction (OpenIE) systems utilize grammatical structure to extract facts with very high recall and good precision. In this paper, we point out that a significant fraction of the extracted facts is, however, not informative. For example, for the … Cited by 3 Related articles All 3 versions

Open information extraction based on lexical-syntactic patterns C Castella Xavier, S de Lima, V Lu?cia… – Intelligent Systems ( …, 2013 – ieeexplore.ieee.org Abstract—Open Information Extraction (Open IE) is an unsupervised strategy to draw out relations from text without predefining these relations, regardless the domain. This paper describes a novel Open IE approach that performs unsupervised extraction of triples by … Cited by 4 Related articles All 5 versions

Boosting open information extraction with noun-based relations C Xavier, VS Lima – Proceedings of the ninth international conference …, 2014 – lrec-conf.org Abstract Open Information Extraction (Open IE) is a strategy for learning relations from texts, regardless the domain and without predefining these relations. Work in this area has focused mainly on verbal relations. In order to extend Open IE to extract relationships that … Cited by 2 Related articles

Open information extraction from biomedical literature using predicate-argument structure patterns N Nguyen, M Miwa, Y Tsuruoka, S Tojo – Proceedings of The 5th …, 2013 – jaist.ac.jp Abstract In this paper, we propose an open information extraction (Open IE) system, which attempts to extract relations (or facts) of any type from biomedical literature. What distinguishes our system from existing Open IE systems is that it uses predicateargument … Cited by 3 Related articles

Semantifying triples from open information extraction systems A Dutta, C Meilicke… – … 2014: Proceedings of the …, 2014 – books.google.com Abstract. The last few years have witnessed some remarkable success of the stateof-the art unsupervised knowledge extraction systems like NELL and REVERB. These systems are gifted with typically web-scale coverage but are often plagued with ambiguity due to lack … Cited by 2 Related articles All 4 versions

An Overview of Open Information Extraction P Gamallo – Maria João Varanda Pereira José Paulo Leal, 2014 – drops.dagstuhl.de Abstract Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will introduce the main properties of this extraction … Cited by 1 Related articles All 5 versions

Joint Inference Open Information Extraction Based on Markov Logic Networks [J] LIU Yong-bin YANG Bing-ru, LIGLIU Ying-hua – Computer Science, 2012 – en.cnki.com.cn In recent decades, natural language processing has made great progress. Better model of each sub-problem achieves 90% accuracy or better. However, success in integrated, end-to- end natural language understanding remains elusive. The main reasons are that the … Cited by 1 Related articles

Towards Enriching Linked Open Data via Open Information Extraction. A Koukourikos, V Karkaletsis, GA Vouros – KNOW@ LOD, 2012 – cru.iit.demokritos.gr Abstract. The descriptions of various entities on Linked Data repositories are subject to constant renewals and modifications, with respect to both the descriptions of concepts and relations and entities realizing their instantiations. Thus, the underlying ontologies have to … Cited by 1 Related articles All 5 versions

Entity Linking for Open Information Extraction A Dutta, M Schuhmacher – Natural Language Processing and Information …, 2014 – Springer Abstract Open domain information extraction (OIE) projects like Nell or ReVerb are often impaired by a schema-poor structure. This severely limits their application domain in spite of having web-scale coverage. In this work we try to disambiguate an OIE fact by referring its … Cited by 1 Related articles All 4 versions

A Cross-lingual Annotation Projection-based Self-supervision Approach for Open Information Extraction. S Kim, M Jeong, J Lee, GG Lee – IJCNLP, 2011 – isoft.postech.ac.kr Abstract Open information extraction (IE) is a weakly supervised IE paradigm that aims to extract relation-independent information from large-scale natural language documents without significant annotation efforts. A key challenge for Open IE is to achieve self- … Cited by 1 Related articles All 5 versions

Entity-Centric Coreference Resolution of Person Entities for Open Information Extraction M Garcia, P Gamallo – Procesamiento del Lenguaje Natural, 2014 – journal.sepln.org Resumen This work presents a coreference resolution system of person entities based on a multi-pass architecture which sequentially applies a set of independent modules, using an entity-centric approach. Several evaluations show that the system obtains promising … Cited by 4 Related articles All 4 versions

Semantics-aware open information extraction in the biomedical domain V Nebot, R Berlanga – Proceedings of the 4th International Workshop on …, 2011 – dl.acm.org Abstract The increasing amount of biomedical scientific literature published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on recognizing well-defined … Cited by 2 Related articles All 2 versions

SRDF: Korean Open Information Extraction using Singleton Property S Nam, Y Hahm, S Nam, KS Choi – semanticweb.kaist.ac.kr Abstract. In this paper, we propose a new Korean Open Information Extraction system so- called SRDF. The SRDF system has been designed to effectively extract reified triples from Korean natural language texts based on the use of singleton property and other natural …

Using Open Information Extraction and Linked Open Data towards Ontology Enrichment and Alignment A Koukourikos, P Karampiperis, G Vouros… – Advanced Information …, 2012 – Springer Abstract The interlinking, maintenance and updating of different Linked Data repositories is steadily becoming a critical issue as the amount of published data increases. The wealth of information across the World Wide Web can be exploited in order to provide additional … Cited by 1 Related articles All 6 versions

Improving Open Information Extraction using Domain Knowledge CK Emani, CF Da Silva, B Fies, P Ghodous – ceur-ws.org Abstract. Open Information Extraction (OIE) aims to identify all the possible assertions within a sentence. Recent and thus the most efficient OIE-tools use the grammatical dependencies or the syntactic tree of the sentence to perform extraction. When they provide a wrong … Related articles

An Overview of Open Information Extraction (Invited talk) P Gamallo – 3rd Symposium on Languages, Applications and … – drops.dagstuhl.de Abstract Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will introduce the main properties of this extraction … Cited by 2 Related articles All 2 versions

Open Information Extraction to KBP Relations in 3 Hours S Soderland, J Gilmer, R Bart, O Etzioni, DS Weld – nist.gov Abstract We participated in both the English Slot Filling and Entity Linking in the 2013 TAC- KBP evaluation. Our Slot Filling system provides an answer to the following conjectures: Can Open Information Extraction (Open IE) form the basis of a high precision extractor for … Cited by 2 Related articles All 4 versions

Open Information Extraction from Real Internet Texts in Spanish Using Constraints over Part-Of-Speech Sequences: Problems of the Method, Their Causes, and … A Zhila, A Gelbukh – gelbukh.g-sidorov.org Abstract: Usually we do not know the domain of an arbitrary text from the Internet, or the semantics of the relations it conveys. While humans identify such information easily, for a computer this task is far from straightforward. The task of detecting relations of arbitrary …

Improving Scalability of Discriminative Learning Markov Logic Networks for Open Information Extraction. CP Ouyang, YB Liu, X Yang, Y Yu… – International Journal of …, 2013 – search.ebscohost.com Abstract Traditional information extraction (IE) methods need to manually tag relations examples. With the rapid development of web technology, especially proliferation of mobile devices, traditional information extraction technology is facing new challenges. The … Related articles All 2 versions

A Lexicalized Tree Kernel for Open Information Extraction Y Xu, C Ringlstetter, MY Kim, R Goebel, G Kondrak… – Volume 2: Short Papers – aclweb.org Abstract In contrast with traditional relation extraction, which only considers a fixed set of relations, Open Information Extraction (Open IE) aims at extracting all types of relations from text. Because of data sparseness, Open IE systems typically ignore lexical information, …

[BOOK] Exploiting knowledge in unsupervised open information extraction Y Merhav – 2012 – dl.acm.org Abstract The extraction of structured information from text is a long-standing challenge in Natural Language Processing (NLP) which has been reinvigorated with the ever-increasing availability of user-generated textual content Online. The ability to extract interesting and … All 6 versions

Open information extraction based on lexical semantics CC Xavier, VLS de Lima, M Souza – Journal of the Brazilian Computer …, 2015 – Springer Abstract Background Open Information Extraction (Open IE) aims to obtain not predefined, domain-independent relations from text. This article introduces the Open IE research field, thoroughly discussing the main ideas and systems in the area as well as its main … Related articles All 3 versions

Entity-Centric Coreference Resolution of Person Entities for Open Information Extraction M García González, P Gamallo Otero – 2014 – rua.ua.es This work presents a coreference resolution system of person entities based on a multi-pass architecture which sequentially applies a set of independent modules, using an entity-centric approach. Several evaluations show that the system obtains promising results in different … Related articles

A weighting scheme for open information extraction Y Merhav – Proceedings of the 2012 Conference of the North …, 2012 – dl.acm.org Abstract We study the problem of extracting all possible relations among named entities from unstructured text, a task known as Open Information Extraction (Open IE). A state-of-the-art Open IE system consists of natural language processing tools to identify entities and … Cited by 1 Related articles All 9 versions

Towards an Architecture for Open-domain Information Extraction: Integrated Extraction, Clustering, and Reasoning with Patterns A Boldyrev, M Theobald, G Weikum – 2012 – people.mpi-inf.mpg.de … open information extraction system. It extracts automatically a large set of non- canonicalized relational tuples and computes their probabilities. … threshold value is used. The ReVerb has three major differences from open information extraction systems such as TextRunner: … Related articles All 4 versions

Learning non-verbal relations under open information extraction paradigm CC Xavier – 2014 – repositorio.pucrs.br O paradigma Open Information Extraction-Open IE (Extração Aberta de Informações) de extração de relações trabalha com a identificação de relações não definidas previamente, buscando superar as limitações impostas pelos métodos tradicionais de Extração de … Related articles All 6 versions

Open Information Extraction for Spanish Language based on Syntactic Constraints A Zhila, A Gelbukh – ACL 2014, 2014 – aclweb.org Abstract Open Information Extraction (Open IE) serves for the analysis of vast amounts of texts by extraction of assertions, or relations, in the form of tuples (argument 1; relation; argument 2). Various approaches to Open IE have been designed to perform in a fast, … Cited by 1 Related articles All 11 versions

Leveraging Linguistic Structure For Open Domain Information Extraction G Angeli, MJ Premkumar, CD Manning – nlp.stanford.edu … 1 Introduction Open information extraction (open IE) has been shown to be useful in a number of NLP tasks, such as question answering (Fader et al., 2014), rela- tion extraction (Soderland et al., 2010), and infor- mation retrieval (Etzioni, 2011). …

N-ary Relation Approach for Open Domain Question Answering System Based on Information Extraction through World Wide Web R Yadav, SR Tandan – ijeas.org … They introduced Open Information Extraction paradigm which is basis for question-answering system. N-ary Relation Approach for Open Domain Question Answering System Based on Information Extraction through World Wide Web Roma Yadav, SRTandan Page 2. …