Advanced Applications of Natural Language Processing for Performing Information Extraction (2015) .. by Mario Rodrigues & Antonio Teixeira
Contents
1 Introduction … 1
1.1 Document Society … 1
1.2 Problems … 2
1.3 Semantics and Knowledge Representation … 3
1.4 Natural Language Processing … 4
1.5 Information Extraction … 5
1.5.1 Main Challenges in Information Extraction … 5
1.5.2 Approaches to Information Extraction … 6
1.5.3 Performance Measures … 7
1.5.4 General Architecture for Information Extraction … 8
1.6 Book Structure … 8
References … 10
2 Data Gathering, Preparation and Enrichment … 13
2.1 Process Overview … 13
2.2 Tokenization and Sentence Boundary Detection … 15
2.2.1 Tools … 15
2.2.2 Representative Tools: Punkt and iSentenizer … 16
2.3 Morphological Analysis and Part-of-Speech Tagging … 17
2.3.1 Tools … 18
2.3.2 Representative Tools: Stanford POS Tagger, SVMTool, and TreeTagger … 19
2.4 Syntactic Parsing … 20
2.4.1 Representative Tools: Epic, StanfordParser, MaltParser, TurboParser … 21
2.5 Representative Software Suites … 23
2.5.1 Stanford NLP … 23
2.5.2 Natural Language Toolkit (NLTK) … 24
2.5.3 GATE … 24
References … 24
3 Identifying Things, Relations, and Semantizing Data … 27
3.1 Identifying the Who, the Where, and the When … 27
3.2 Relating Who, What, When, and Where … 30
3.3 Getting Everything Together … 32
3.3.1 Ontology … 32
3.3.2 Ontology-Based Information Extraction (OBIE) … 33
References … 34
4 Extracting Relevant Information Using a Given Semantic … 37
4.1 Introduction … 37
4.2 Defi ning How and What Information Will Be Extracted … 38
4.3 Architecture … 39
4.4 Implementation of a Prototype Using State-of-the-Art Tools .. 40
4.4.1 Natural Language Processing … 41
4.4.2 Domain Representation … 44
4.4.3 Semantic Extraction and Integration … 45
References … 49
5 Application Examples … 51
5.1 A Tutorial Example … 51
5.1.1 Selecting and Obtaining Software Tools … 53
5.1.2 Tools Setup … 53
5.1.3 Processing the Target Document … 54
5.1.4 Using for Other Languages and for Syntactic Parsing .. 58
5.2 Application Example 2: IE Applied to Electronic Government .. 58
5.2.1 Goals … 58
5.2.2 Documents … 59
5.2.3 Obtaining the Documents … 59
5.2.4 Application Setup … 61
5.2.5 Making Available Extracted Information Using a Map .. 65
5.2.6 Conducting Semantic Information Queries … 67
References … 68
6 Conclusion … 71
Index … 73