DeepDive

Notes:

DeepDive is a data management and machine learning platform that uses the Entity-Relationship model to represent complex relationships in structured data. It employs techniques such as distant supervision and Markov logic to incorporate various types of evidence and signals into its modeling process, allowing it to extract structured information from unstructured data sources with a high degree of accuracy. DeepDive is designed to be highly flexible and scalable, and has been used in a variety of applications including knowledge base construction, information extraction, and predictive modeling.

One of the key features of DeepDive is its ability to perform deep natural language processing (NLP) on large volumes of unstructured text data. By extracting linguistic features such as named-entity mentions and dependency paths, DeepDive is able to extract structured information from text with a high degree of accuracy.

In addition to its NLP capabilities, DeepDive also uses classic data management and optimization techniques to perform web-scale statistical learning and inference. This allows it to scale to very large data sets and perform complex predictive modeling tasks. DeepDive’s combination of advanced NLP capabilities and data management techniques makes it a powerful tool for extracting structured information from unstructured data sources.

Resources:

Wikipedia:

Entity-relationship model

References:

DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference (2012)

See also:

Alchemy Open Source AI | EntityCube | YAGO-QA | YAGO2

DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. F Niu, C Zhang, C Ré, JW Shavlik – VLDS, 2012 – www-cs.stanford.edu … However, DeepDive goes deeper in two ways: (1) Unlike prior large-scale KBC systems, DeepDive performs deep natural language processing (NLP) to extract useful 1http://research.cs.wisc.edu/hazy/deepdive 2http://lemurproject.org/clueweb09.php/ … Cited by 16 Related articles All 13 versions

LOOK4: Enhancement of web search results with Universal Words and WordNet A Avetisyan, V Avetisyan – Proc. 5 th Int. Conf. on Global WordNet, 2010 – cfilt.iitb.ac.in … The new sets of keywords are being translated into target language with corresponding instructions for the search engine, restricting the natural language domain and the … So what is the main advantage of Look4 over engines like Hakia, Powerset, SenseBot and DeepDive. … Cited by 3 Related articles All 2 versions

Knowledge Bases in the Age of Big Data Analytics FM Suchanek, G Weikum – Proceedings of the VLDB Endowment, 2014 – vldb.org … and the most salient KB projects, which include KnowItAll [10, 11], BabelNet [22], ConceptNet [28], DBpedia [3, 18], DeepDive [24], Freebase … a pre-specified set of relations and en- tities, open information extraction harvests arbitrary SPO triples from natural language documents …

Knowledge harvesting in the big-data era F Suchanek, G Weikum – … of the 2013 international conference on …, 2013 – dl.acm.org … hasWonPrize relation, we aim to automatically learn that nominatedForPrize is also an interest- ing relation expressed by natural-language patterns such as … up Statistical Inference in Markov Logic Networks using an RDBMS, VLDB 2011 [86] F. Niu et al.: DeepDive: Web-scale … Cited by 9 Related articles All 3 versions

Language-Aware Truth Assessment of Fact Candidates N Nakashole, TM Mitchell – ling.uni-potsdam.de … Within the context of informa- tion extraction, fact extractors assign confidence scores to extracted facts. However, such scores are often tied to the extractor’s ability to read and understand natural language text. … Synonymous Relations. Natural language is di- verse. …

Knowledge harvesting from text and Web sources F Suchanek, G Weikum – Data Engineering (ICDE), 2013 IEEE …, 2013 – ieeexplore.ieee.org … Set-expansion methods, typically bootstrapped with a few seed instances, exploit special patterns in natural- language sentences or Web tables … WSDM 2011 [23] F. Niu et al.: DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference, VLDS … Cited by 3 Related articles All 8 versions

Advances in Automated Knowledge Base Construction FM Suchanek, J Fan, R Hoffmann… – SIGMOD Records …, 2013 – suchanek.name … This re- quires a variety of natural language technology, such as deep semantic understanding of people, events, relations, etc. … The first was DeepDive [31], a web-scale KB population engine that crawls 500M webpages, 400k videos and 20k books. … Cited by 2 Related articles All 2 versions

Hazy: Making it easier to build and maintain big-data analytics A Kumar, F Niu, C Ré – Communications of the ACM, 2013 – dl.acm.org … several knowledge-based construction systems (namely, DeepDive, GeoDeepDive, and AncientText). Furthermore, our (open source) software stack has been downloaded thousands of times and used by different communities such as natural language processing, chemistry … Cited by 20 Related articles All 6 versions

Fine-grained Semantic Typing of Emerging Entities. N Nakashole, T Tylenda, G Weikum – ACL (1), 2013 – aclweb.org … In Proceedings of the 2007 Joint Conference on Empirical Meth- ods in Natural Language Processing and Com- putational Natural Language Learning (EMNLP- … In Proceedings of the Recent Ad- vances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2007. … Cited by 11 Related articles All 4 versions

A Machine Reading System for Assembling Synthetic Paleontological Databases SE Peters, C Zhang, M Livny, C Ré – PloS one, 2014 – dx.plos.org … The first step in the DeepDive process is to perform document parsing, including optical character recognition (OCR), document layout recognition, and natural language processing (NLP) of the text (Fig. 1; Figs. S1–S3 in File S1). …

Feature Engineering for Knowledge Base Construction C Zhang, C Ré, AA Sadeghian, Z Shan, J Shin… – arXiv preprint arXiv: …, 2014 – arxiv.org … As a result, this prototype attacks challenges in optical character recognition, natural language processing, and information extraction and integration. … We found that the KBC system built on DeepDive has achieved comparable—and sometimes better—quality than a knowledge …

Mining Opinions and Trends from Social Media A Gupta – wiki.epfl.ch … C. Incorporating Meta-Information Along with natural language text, social media data contains a variety of meta-information such as personal information of users … ”DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.” In VLDS, pp. …

Feature Engineering for Knowledge Base Construction C Ré, AA Sadeghian, Z Shan, J Shin… – arXiv preprint arXiv: …, 2014 – 131.107.65.22 … As a result, this prototype attacks challenges in optical character recognition, natural language processing, and information extraction and integration. … We found that the KBC system built on DeepDive has achieved comparable—and sometimes better—quality than a knowledge …

Overview of the English Slot Filling Track at the TAC2014 Knowledge Base Population Evaluation M Surdeanu, H Ji – Proc. Text Analysis Conference (TAC2014), 2014 – nlp.cs.rpi.edu … 1http://www.itl.nist.gov/iad/mig/ tests/ace/ Within this larger effort, the slot filling (SF) subtask must extract the values of specified attributes (or slots) for a given entity from large collections of natural language texts. … 4http://deepdive.stanford.edu/ Page 8. … Cited by 1

Understand Relations in Knowledge Base Construction H Wang – 2014 – tw.rpi.edu … In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. July. 2012, pp. 1135–1145. url: http://dl.acm.org/citation.cfm?id=2391076. [23] F Niu et al. “DeepDive: Web-scale Knowledge …

A machine-compiled macroevolutionary history of Phanerozoic life SE Peters, C Zhang, M Livny, C Ré – arXiv preprint arXiv:1406.2963, 2014 – arxiv.org … interest. The first step in the DeepDive process is to perform document parsing tasks, including optical character recognition (OCR), document layout recognition, and natural language parsing (NLP) of the text (Fig. S1). These … Cited by 1

Combining information extraction and human computing for crowdsourced knowledge acquisition SK Kondreddi, P Triantafillou… – Data Engineering (ICDE …, 2014 – ieeexplore.ieee.org … 18], [44], [51]. When tapping into natural-language text, KA critically relies on information extraction (IE) technology, combining methods from pattern matching, computational linguistics, and statistical learning. Open IE methods … Cited by 1 Related articles All 4 versions

Hephaestus: Data Reuse for Accelerating Scientific Discovery J Duggan, ML Brodie – people.csail.mit.edu … US DARPA has identified causality modeling as a high priority for the ad- vancement of research [53], although they consider it in the context of natural language processing and not statistical hypothesis testing for open science data. …

Statistical relational data integration for information extraction M Niepert – Reasoning Web. Semantic Technologies for Intelligent …, 2013 – Springer … While this could be explained with the specific applications the creators have in mind (improved keyword search and natural language question answering, for … are aware of and that we are not able to cover here due to space considerations are Freebase [10] and DeepDive [72]. … Cited by 1 Related articles All 3 versions

From Data Fusion to Knowledge Fusion XL Dong, E Gabrilovich, G Heitz… – Proceedings of the …, 2014 – www-devel.cs.ubc.ca … The sec- ond main body of work is related to knowledge base construction, such as YAGO [32], NELL [7], and DeepDive [24]. … For texts, we first run standard natural language pro- cessing tools for named entity recognition, parsing, co-reference resolution, etc. … Cited by 1 Related articles All 4 versions

Smarter data E Maguire, N Seeman, E Meerkamper – 2012 – riwi.com Page 1. Experts’ views for expert investors The group of companies that comprise CLSA are affiliates of Credit Agricole Securities (USA) Inc. For important disclosure information please refer to page 40. USA Technology 4 December 2012 … Related articles

Faust: Flexible Acquistion and Understanding System for Text LL Voss, DE Wilkins, D Israel, C Manning, D Jurafsky… – 2013 – DTIC Document … 13. SUPPLEMENTARY NOTES 14. ABSTRACT The vast majority of scientific and technical knowledge is expressed in natural-language (NL) texts. … 24 4.3 4.3.1 Natural Language Processing ….. 24 … Related articles

[BOOK] Generative collectives W van Osch – 2012 – dare.uva.nl Page 1. Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://dare.uva.nl/document/354394 File ID 354394 Filename Thesis SOURCE (OR PART OF THE FOLLOWING SOURCE … Cited by 10 Related articles All 6 versions

Data Engineering E Gribkoff, D Suciu, G Van den Broeck, C Ré… – 131.107.65.22 … Chris Re et al. from Stanford describe DeepDive, which is a system for declaratively specifying (and running) large-scale knowledge-base construction workflows. While no one would describe DeepDive as a database system …