Learning to Rank for Information Retrieval


Learning to Rank for Information Retrieval (2011) .. by Tie-Yan Liu


Contents

Part I Overview of Learning to Rank

1 Introduction . . . 3

1.1 Overview . . . 3

1.2 Ranking in Information Retrieval . . . 7

1.2.1 Conventional Ranking Models . . . 7

1.2.2 Query-Level Position-Based Evaluations . . . 11

1.3 Learning to Rank . . . 15

1.3.1 Machine Learning Framework . . . 16

1.3.2 Definition of Learning to Rank . . . 17

1.3.3 Learning-to-Rank Framework . . . 18

1.4 Book Overview . . . 23

1.5 Exercises . . . 24

References . . . 25

Part II Major Approaches to Learning to Rank

2 The Pointwise Approach . . . 33

2.1 Overview . . . 33

2.2 Regression-Based Algorithms . . . 33

2.2.1 Subset Ranking with Regression . . . 34

2.3 Classification-Based Algorithms . . . 35

2.3.1 Binary Classification for Ranking . . . 35

2.3.2 Multi-class Classification for Ranking . . . 37

2.4 Ordinal Regression-Based Algorithms . . . 39

2.4.1 Perceptron-Based Ranking (PRanking) . . . 39

2.4.2 Ranking with Large Margin Principles . . . 40

2.4.3 Ordinal Regression with Threshold-Based Loss Functions . 42

2.5 Discussions . . . 42

2.5.1 Relationship with Relevance Feedback . . . 43

2.5.2 Problems with the Pointwise Approach . . . 44

2.5.3 Improved Algorithms . . . 44

2.6 Summary . . . 45

2.7 Exercises . . . 45

References . . . 46

3 The Pairwise Approach . . . 49

3.1 Overview . . . 49

3.2 Example Algorithms . . . 49

3.2.1 Ordering with Preference Function . . . 49

3.2.2 SortNet:Neural Network-Based Sorting Algorithm . . . 51

3.2.3 RankNet: Learning to Rank with Gradient Descent . . . 52

3.2.4 FRank: Ranking with a Fidelity Loss . . . 53

3.2.5 Rank Boost . . . 54

3.2.6 Ranking SVM . . . 56

3.2.7 GBRank . . . 58

3.3 Improved Algorithms . . . 59

3.3.1 Multiple Hyperplane Ranker . . . 59

3.3.2 Magnitude-Preserving Ranking . . . 60

3.3.3 IR-SVM . . . 61

3.3.4 Robust Pairwise Ranking with Sigmoid Functions . . . 62

3.3.5 P-norm Push . . . 63

3.3.6 Ordered Weighted Average for Ranking . . . 64

3.3.7 Lambda Rank . . . 65

3.3.8 Robust Sparse Ranker . . . 66

3.4 Summary . . . 67

3.5 Exercises . . . 67

References . . . 68

4 The Listwise Approach . . . 71

4.1 Overview . . . 71

4.2 Minimization of Measure-Specific Loss . . . 72

4.2.1 Measure Approximation . . . 72

4.2.2 Bound Optimization . . . 77

4.2.3 Non-smooth Optimization . . . 78

4.2.4 Discussions . . . 80

4.3 Minimization of Non-measure-Specific Loss . . . 80

4.3.1 List Net . . . 81

4.3.2 List MLE . . . 82

4.3.3 Ranking Using Cumulative Distribution Networks . . . 83

4.3.4 Boltz Rank . . . 84

4.4 Summary . . . 85

4.5 Exercises . . . 86

References . . . 87

5 Analysis of the Approaches . . . 89

5.1 Overview . . . 89

5.2 The Pointwise Approach . . . 89

5.3 The Pairwise Approach . . . 91

5.4 The Listwise Approach . . . 94

5.4.1 Non-measure-Specific Loss . . . 94

5.4.2 Measure-Specific Loss . . . 95

5.5 Summary . . . 98

5.6 Exercises . . . 98

References . . . 99

Part III Advanced Topics in Learning to Rank

6 Relational Ranking . . . 103

6.1 General Relational Ranking Framework . . . 104

6.1.1 Relational Ranking SVM . . . 104

6.1.2 Continuous Conditional Random Fields . . . 106

6.2 Learning Diverse Ranking . . . 107

6.3 Discussions . . . 110

References . . . 111

7 Query-Dependent Ranking . . . 113

7.1 Query-Dependent Loss Function . . . 113

7.2 Query-Dependent Ranking Function . . . 115

7.2.1 Query Classification-Based Approach . . . 115

7.2.2 K Nearest Neighbor-Based Approach . . . 116

7.2.3 Query Clustering-Based Approach . . . 118

7.2.4 Two-Layer Learning Approach . . . 119

7.3 Discussions . . . 120

References . . . 121

8 Semi-supervised Ranking . . . 123

8.1 Inductive Approach . . . 123

8.2 Transductive Approach . . . 124

8.3 Discussions . . . 125

References . . . 125

9 Transfer Ranking . . . 127

9.1 Feature-Level Transfer Ranking . . . 128

9.2 Instance-Level Transfer Ranking . . . 128

9.3 Discussions . . . 130

References . . . 130

Part IV Benchmark Datasets for Learning to Rank

10 The LETOR Datasets . . . 133

10.1 Overview . . . 133

10.2 Document Corpora . . . 133

10.2.1 The “Gov” Corpus and Six Query Sets . . . 134

10.2.2 The OHSUMED Corpus . . . 134

10.2.3 The “Gov2” Corpus and Two Query Sets . . . 135

10.3 Document Sampling . . . 135

10.4 Feature Extraction . . . 136

10.5 Meta Information . . . 136

10.6 Learning Tasks . . . 138

10.7 Discussions . . . 142

References . . . 142

11 Experimental Results on LETOR . . . 145

11.1 Experimental Settings . . . 145

11.2 Experimental Results on LETOR 3.0 . . . 146

11.3 Experimental Results on LETOR 4.0 . . . 149

11.4 Discussions . . . 150

11.5 Exercises . . . 151

References . . . 151

12 Other Datasets . . . 153

12.1 Yahoo! Learning-to-Rank Challenge Datasets . . . 153

12.2 Microsoft Learning-to-Rank Datasets . . . 154

12.3 Discussions . . . 155

References . . . 155

Part V Practical Issues in Learning to Rank

13 Data Preprocessing for Learning to Rank . . . 159

13.1 Overview . . . 159

13.2 Ground Truth Mining from Logs . . . 160

13.2.1 User Click Models . . . 160

13.2.2 Click Data Enhancement . . . 166

13.3 Training Data Selection . . . 168

13.3.1 Document and Query Selection for Labeling . . . 169

13.3.2 Document and Query Selection for Training . . . 171

13.3.3 Feature Selection for Training . . . 175

13.4 Summary . . . 176

13.5 Exercises . . . 176

References . . . 177

14 Applications of Learning to Rank . . . 181

14.1 Overview . . . 181

14.2 Question Answering . . . 181

14.2.1 Definitional QA . . . 182

14.2.2 Quantity Consensus QA . . . 183

14.2.3 Non-factoid QA . . . 184

14.2.4 Why QA . . . 185

14.3 Multimedia Retrieval . . . 186

14.4 Text Summarization . . . 187

14.5 Online Advertising . . . 188

14.6 Summary . . . 189

14.7 Exercises . . . 190

References . . . 190

Part VI Theories in Learning to Rank

15 Statistical Learning Theory for Ranking . . . 195

15.1 Overview . . . 195

15.2 Statistical Learning Theory . . . 195

15.3 Learning Theory for Ranking . . . 197

15.3.1 Statistical Ranking Framework . . . 197

15.3.2 Generalization Analysis for Ranking . . . 198

15.3.3 Statistical Consistency for Ranking . . . 198

15.4 Exercises . . . 199

References . . . 199

16 Statistical Ranking Framework . . . 201

16.1 Document Ranking Framework . . . 202

16.1.1 The Pointwise Approach . . . 202

16.1.2 The Pairwise Approach . . . 202

16.1.3 The Listwise Approach . . . 204

16.2 Subset Ranking Framework . . . 204

16.2.1 The Pointwise Approach . . . 205

16.2.2 The Pairwise Approach . . . 205

16.2.3 The Listwise Approach . . . 206

16.3 Two-Layer Ranking Framework . . . 206

16.3.1 The Pointwise Approach . . . 206

16.3.2 The Pairwise Approach . . . 207

16.3.3 The Listwise Approach . . . 208

16.4 Summary . . . 208

16.5 Exercises . . . 208

References . . . 209

17 Generalization Analysis for Ranking . . . 211

17.1 Overview . . . 211

17.2 Uniform Generalization Bounds for Ranking . . . 212

17.2.1 For Document Ranking . . . 212

17.2.2 For Subset Ranking . . . 214

17.2.3 For Two-Layer Ranking . . . 216

17.3 Algorithm-Dependent Generalization Bound . . . 217

17.3.1 For Document Ranking . . . 218

17.3.2 For Subset Ranking . . . 219

17.3.3 For Two-Layer Ranking . . . 220

17.4 Summary . . . 220

17.5 Exercises . . . 221

References . . . 221

18 Statistical Consistency for Ranking . . . 223

18.1 Overview . . . 223

18.2 Consistency Analysis for Document Ranking . . . 224

18.2.1 RegardingPairwise0–1Loss . . . 224

18.3 Consistency Analysis for Subset Ranking . . . 224

18.3.1 Regarding DCG-Based Ranking Error . . . 225

18.3.2 Regarding Permutation-Level 0–1 Loss . . . 225

18.3.3 Regarding Top-k True Loss . . . 226

18.3.4 Regarding Weighted Kendall’s t . . . 227

18.4 Consistency Analysis for Two-Layer Ranking . . . 229

18.5 Summary . . . 229

18.6 Exercises . . . 230

References . . . 230

Part VII Summary and Outlook

19 Summary . . . 235

References . . . 238

20 Future Work . . . 241

20.1 Sample Selection Bias . . . 241

20.2 Direct Learning from Logs . . . 242

20.3 Feature Engineering . . . 243

20.4 Advanced Ranking Models . . . 243

20.5 Large-Scale Learning to Rank . . . 244

20.6 Online Complexity Versus Accuracy . . . 245

20.7 Robust Learning to Rank . . . 245

20.8 Online Learning to Rank . . . 246

20.9 Beyond Ranking . . . 247

References . . . 247

Part VIII Appendix

21 Mathematical Background . . . 251

21.1 Probability Theory . . . 251

21.1.1 Probability Space and Random Variables . . . 251

21.1.2 Probability Distributions . . . 252

21.1.3 Expectations and Variances . . . 254

21.2 Linear Algebra and Matrix Computation . . . 255

21.2.1 Notations . . . 255

21.2.2 Basic Matrix Operations and Properties . . . 256

21.2.3 Eigenvalues and Eigenvectors . . . 261

21.3 Convex Optimization . . . 262

21.3.1 Convex Set and Convex Function . . . 262

21.3.2 Conditions for Convexity . . . 263

21.3.3 Convex Optimization Problem . . . 263

21.3.4 Lagrangian Duality . . . 264

21.3.5 KKT Conditions . . . 265

References . . . 266

22 Machine Learning . . . 267

22.1 Regression . . . 267

22.1.1 Linear Regression . . . 267

22.1.2 Probabilistic Explanation . . . 268

22.2 Classification . . . 269

22.2.1 Neural Networks . . . 270

22.2.2 Support Vector Machines . . . 271

22.2.3 Boosting . . . 273

22.2.4 K Nearest Neighbor (KNN) . . . 274

22.3 Statistical Learning Theory . . . 274

22.3.1 Formalization . . . 275

22.3.2 Bounds for |R(g) – ˆ R(g)| . . . 277

References . . . 282

Index . . . 283