Speech Processing and Soft Computing (2011) .. by Sid-Ahmed Selouani
Contents
1 Introduction . . . 1
1.1 Soft Computing Paradigm . . . 1
1.2 Soft Computing in Speech Processing . . . 2
1.3 Organization of the Book .. . . 2
1.4 Note to the Reader . . . 4
Part I Soft Computing and Speech Enhancement
2 Speech Enhancement Paradigm. . . 7
2.1 Speech Enhancement Usefulness. . . 7
2.2 Noise Characteristics and Estimation . . . 8
2.2.1 Noise Characteristics . . . 8
2.2.2 Noise Estimation.. . . 9
2.3 Overview of Speech Enhancement Methods . . . 10
2.3.1 Spectral Subtractive Techniques . . . 10
2.3.2 Statistical-model-based Techniques. . . 10
2.3.3 Subspace Decomposition Techniques . . . 11
2.3.4 Perceptual-based Techniques.. . . 12
2.4 Evaluation of Speech Enhancement Algorithms . . . 12
2.4.1 Time-Domain Measures . . . 13
2.4.2 Spectral Domain Measures . . . 13
2.4.3 Perceptual Domain Measures . . . 13
2.5 Summary . . . 14
3 Connectionist Subspace Decomposition for Speech Enhancement . . . 15
3.1 Method Overview . . . 15
3.2 Definitions . . . 16
3.3 Eigenvalue Decomposition . . . 16
3.4 Singular Value Decomposition . . . 18
3.5 KLT Model Identification in the Mel-scaled Cepstrum . . . 19
3.6 Two-Stage Noise Removal Technique . . . 21
3.7 Experiments . . . 22
3.8 Summary . . . 24
4 Variance of the Reconstruction Error Technique . . . 25
4.1 General Principle .. . . 25
4.2 KLT Speech Enhancement using VRE Criterion . . . 26
4.2.1 Optimized VRE . . . 27
4.2.2 Signal Reconstruction . . . 28
4.3 Evaluation of the KLT-VRE Enhancement Method . . . 29
4.3.1 Speech Material . . . 29
4.3.2 Baseline Systems and Comparison Results. . . 29
4.4 Summary . . . 32
5 Evolutionary Techniques for Speech Enhancement. . . 33
5.1 Principle of the Method .. . . 33
5.2 Global Framework of Evolutionary Subspace Filtering Method . . . 34
5.3 Hybrid KLT-GA Enhancement . . . 34
5.3.1 Solution Representation . . . 35
5.3.2 Selection Function .. . . 35
5.3.3 Crossover and Mutation . . . 36
5.4 Objective Function and Termination . . . 37
5.5 Experiments . . . 37
5.5.1 Speech Databases . . . 38
5.5.2 Experimental Setup . . . 38
5.5.3 Performance Evaluation . . . 39
5.6 Summary . . . 40
Part II Soft Computing and Automatic Speech Recognition
6 Robustness of Automatic Speech Recognition . . . 43
6.1 Evolution of Speech Recognition Systems . . . 43
6.2 Speech Recognition Problem . . . 44
6.3 Robust Representation of Speech Signals . . . 46
6.3.1 Cepstral Acoustic Features . . . 46
6.3.2 Robust Auditory-Based Phonetic Features . . . 47
6.4 ASR Robustness . . . 52
6.4.1 Signal compensation techniques . . . 53
6.4.2 Feature Space Techniques . . . 53
6.4.3 Model Space Techniques . . . 54
6.5 Speech Recognition and Human-Computer Dialog . . . 57
6.5.1 Dialog Management Systems . . . 58
6.5.2 Dynamic Pattern Matching Dialog Application .. . . 59
6.6 ASR Robustness and Soft Computing Paradigm . . . 61
6.7 Summary . . . 62
7 Artificial Neural Networks and Speech Recognition .. . . 63
7.1 Related Work . . . 63
7.2 Hybrid HMM/ANN Systems . . . 64
7.3 Autoregressive Time-Delay Neural Networks . . . 65
7.4 AR-TDNN vs. TDNN . . . 67
7.5 HMM/AR-TDNN Hybrid Structure. . . 68
7.6 Experiment and results . . . 69
7.6.1 Speech Material and Tools . . . 70
7.6.2 Setup of the Classification Task . . . 71
7.6.3 Discussion. . . 72
7.7 Summary . . . 73
8 Evolutionary Algorithms and Speech Recognition .. . . 75
8.1 Expected Advantages . . . 75
8.2 Problem Statement . . . 76
8.3 Multi-Stream Statistical Framework . . . 77
8.4 Hybrid KLT-VRE-GA-based Front-End Optimization .. . . 78
8.5 Evolutionary Subspace Decomposition using Variance
of Reconstruction Error .. . . 79
8.5.1 Individuals’ Representation and Initialization.. . . 79
8.5.2 Selection Function .. . . 80
8.5.3 Objective Function. . . 81
8.5.4 Genetic Operators and Termination Criterion . . . 81
8.6 Experiments and Results. . . 82
8.6.1 Speech Material . . . 82
8.6.2 Recognition Platform .. . . 83
8.6.3 Tests & Results . . . 83
8.7 Summary . . . 85
9 Speaker Adaptation Using Evolutionary-based Approach .. . . 87
9.1 Speaker Adaptation Approaches . . . 87
9.2 MPE-based Discriminative Linear Transforms for Speaker Adaptation . . . 88
9.3 Evolutionary Linear Transformation Paradigm .. . . 90
9.3.1 Population Initialization .. . . 91
9.3.2 Objective Function. . . 92
9.3.3 Selection Function .. . . 92
9.3.4 Recombination .. . . 93
9.3.5 Mutation.. . . 94
9.3.6 Termination . . . 94
9.4 Experiments . . . 95
9.4.1 Resources and Tools . . . 95
9.4.2 Genetic Algorithm Parameters . . . 95
9.4.3 Result Discussion . . . 95
9.5 Summary . . . 96
References. . . 97