Syntax-Based Collocation Extraction


Syntax-Based Collocation Extraction (2011) .. by Violeta Seretan @vseretan


Contents

1 Introduction . . . 1

1.1 Collocations and Their Relevance for NLP . . . 1

1.2 The Need for Syntax-Based Collocation Extraction . . . 3

1.3 Aims . . . 4

1.4 Chapters Outline . . . 6

2 On Collocations . . . 9

2.1 Introduction . . . 9

2.2 A Survey of Definitions . . . 10

2.2.1 Statistical Approaches . . . 11

2.2.2 Linguistic Approaches . . . 12

2.2.3 Collocation vs. Co-occurrence . . . 13

2.3 Towards a Core Collocation Concept . . . 14

2.4 Theoretical Perspectives on Collocations . . . 17

2.4.1 Contextualism . . . 17

2.4.2 Text Cohesion . . . 18

2.4.3 Meaning-Text Theory . . . 19

2.4.4 Semantics and Metaphoricity . . . 20

2.4.5 Lexis-Grammar Interface . . . 21

2.5 Linguistic Descriptions . . . 22

2.5.1 Semantic Compositionality . . . 22

2.5.2 Morpho-Syntactic Characterisation . . . 24

2.6 What Collocation Means in This Book . . . 26

2.7 Summary. . . 27

3 Survey of Extraction Methods . . . 29

3.1 Introduction . . . 29

3.2 Extraction Techniques . . . 29

3.2.1 Collocation Features Modelled . . . 29

3.2.2 General Extraction Architecture . . . 31

3.2.3 Contingency Tables . . . 32

3.2.4 Association Measures . . . 34

3.2.5 Criteria for the Application of Association Measures . . . 42

3.3 Linguistic Preprocessing . . . 44

3.3.1 Lemmatization . . . 44

3.3.2 POS Tagging . . . 45

3.3.3 Shallow and Deep Parsing . . . 47

3.3.4 Beyond Parsing . . . 48

3.4 Survey of the State of the Art . . . 49

3.4.1 English . . . 50

3.4.2 German . . . 51

3.4.3 French . . . 54

3.4.4 Other Languages . . . 56

3.5 Summary. . . 58

4 Syntax-Based Extraction . . . 59

4.1 Introduction . . . 59

4.2 The Fips Multilingual Parser . . . 62

4.3 Extraction Method . . . 65

4.3.1 Candidate Identification . . . 65

4.3.2 Candidate Ranking . . . 68

4.4 Evaluation . . . 69

4.4.1 On Collocation Extraction Evaluation . . . 69

4.4.2 Evaluation Method . . . 72

4.4.3 Experiment 1: Monolingual Evaluation . . . 75

4.4.4 Results of Experiment 1 . . . 79

4.4.5 Experiment 2: Cross-Lingual Evaluation . . . 81

4.4.6 Results of Experiment 2 . . . 85

4.5 Qualitative Analysis. . . 88

4.5.1 Error Analysis . . . 89

4.5.2 Intersection and Rank Correlation . . . 92

4.5.3 Instance-Level Analysis . . . 94

4.6 Discussion . . . 97

4.7 Summary. . . 100

5 Extensions . . . 103

5.1 Identification of Complex Collocations . . . 103

5.1.1 The Method . . . 104

5.1.2 Experimental Results . . . 107

5.1.3 Related Work . . . 109

5.2 Data-Driven Induction of Syntactic Patterns . . . 111

5.2.1 The Method . . . 112

5.2.2 Experimental Results . . . 113

5.2.3 Related Work . . . 114

5.3 Corpus-Based Collocation Translation . . . 116

5.3.1 The Method . . . 116

5.3.2 Experimental Results . . . 118

5.3.3 Related Work . . . 120

5.4 Summary. . . 121

6 Conclusion . . . 123

6.1 Main Contributions . . . 123

6.2 Future Directions . . . 125

A List of Collocation Dictionaries . . . 129

B List of Collocation Definitions . . . 131

C Association Measures – Mathematical Notes . . . 133

C.1 X2 . . . 133

C.2 Log-Likelihood Ratio . . . 134

D Monolingual Evaluation (Experiment 1) . . . 135

D.1 Test Data and Annotations . . . 135

D.2 Results . . . 154

E Cross-Lingual Evaluation (Experiment 2) . . . 157

E.1 Test Data and Annotations . . . 157

E.2 Results . . . 195

F Output Comparison . . . 197

References . . . 199

Index . . . 213