Coarticulatory influences of liquids on vowels in English Alison Tunley King’s College A dissertation submitted in candidature for the degree of Doctor of Philosophy Department of Linguistics University of Cambridge April 1999 ii Declaration I hereby declare that this thesis is not substantially the same as any that I have submitted for a degree or diploma or other qualification at any other university. I further state that no part of my thesis has already been or is being currently submitted for any such degree, diploma or other qualification. This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration. This thesis does not exceed 80,000 words, including footnotes, references and appendices, but excluding bibliographies. iii Summary Thisthesis explores the coarticulatory influenceof /r/ and/l/ on vowels, locally in VC and CV sequences and over a longer domain where the vowel is separated from the influencing consonant by two other segments. The primary motivation behind the production studies is to improve the quality of rule-generated synthetic speech. Even high quality synthetic speech is immediately recognizable as being computer-generated; such speech is hard to understandinnoisylisteningconditions andrequiressignificantly morecognitive processing effort than natural speech. One contributor to this inferior quality is the absence in the synthetic speech signal of subtle but systematic context-induced acoustic detail. Although a great deal of work has been done on coarticulatory variation, there has been rather little exploration of long-domain coarticulatory effects or of the interaction between metrical and segmental factors which influence patterns of coarticulatory variation. Theproductionstudiesinthisthesisprovidedetailedinformationregardingtheinfluenceof liquids on surrounding vowels and thus are a starting point for perceptual studies assessing the contribution of such acoustic detail to the quality of synthetic speech. Factors such as stress and vowel quality are varied to establish criteria which favour or discourage the spread of coarticulatory influence. The interaction between metrical and segmental factors is explored by looking at stressed and unstressed CV sequences in feet of different lengths and in different positions in the foot. More complex segmental influences on coarticulatory behaviour are examined by incorporating /r/s and /l/s in consonant clusters. A perceptual experiment is conducted to assess the salience of some of the coarticulatory variation described in the thesis. The experiment shows that incorporating coarticulatory detailinsyntheticspeechspreadover/@ r V C @/sequencescanimprovesegmentalintelligi- bility by 7–28%. Thedegree to which such coarticulatory detail contributes to intelligibility is partially dependent on lexical effects, in that the biggest improvements in intelligibility after including coarticulatory detail were found for nonsense words, with rather smaller improvements for monosyllabic and polysyllabic real words. iv Thanks For financial support I am grateful to the British Academy, the Newton Trust and King’s College. Iamgratefultomysupervisor,SarahHawkinsforherguidanceinallaspectsofthisproject. Thanks also to Geoffrey Potter, the Phonetics Laboratory technician, for his enduring patience and good humour and help with a variety of technical problems. Without the support of the Phonetics lab posse this thesis would never have been finished. In particular, thanks to Jonathan Rodgers for guru tricks, the LATEX books and many laughs, and to Kimberley Farrar, Eric Fixmer and Elinor Payne for great coffee breaks. Many other friends have also kept me going; they know who they are and I owe them all a lot of beer. Finally, thanks go to Daniel for keeping me sane and reminding me constantly about what’s really important in life. v Note SSBE stands for Standard Southern British English Statistical significance is taken to be p ≤ 0.05 vi Detail matters “You need a certain amount of complexity to do any particular job. A Saturn V rocket is said to have had seven million parts, all of which had to work. That’s not entirely true. Many of those parts were redundant. But that redundancy was absolutely necessary to achieve the goal of putting someone on the moon in 1969. So if some of those rocket parts had the job of being redundant, then each of those parts still had to do their part. So to speak. They also serve who only stand and wait. We betray ourselves when we say That’s redundant, meaning That’s useless. Redundancy is not always redundant, whether you’re talking about rockets or human languages or com- puter languages. In short, simplicity is often the enemy of success.” Larry Wall, August 25th 1998 2nd State of the Onion at http://www.perl.com vii Contents List of Figures x List of Tables xii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Acoustic variation and speech perception . . . . . . . . . . . . . . . . . . . 2 1.3 Speech synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Processing problems in synthetic speech . . . . . . . . . . . . . . . . 5 1.3.2 Relating naturalness, intelligibility and comprehension . . . . . . . . 9 1.4 Overview of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Influence of liquids on following vowels 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 On the coarticulatory resistance of /i/ . . . . . . . . . . . . . . . . . 14 2.1.2 Language- and accent-specific differences . . . . . . . . . . . . . . . 16 2.1.3 Resolving contradictions over variability of /i/ . . . . . . . . . . . . 18 2.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Measurements and statistical analysis . . . . . . . . . . . . . . . . . . . . . 22 2.5.1 Measuring formant frequencies in vowels . . . . . . . . . . . . . . . . 22 2.5.2 Measuring formant frequencies in consonants . . . . . . . . . . . . . 24 2.5.3 Measuring formant frequencies in schwa . . . . . . . . . . . . . . . . 24 2.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6.1 Consonant to vowel coarticulation in CV sequences . . . . . . . . . . 25 2.6.2 Anticipatory vowel coarticulation in consonants in CV sequences . . 28 2.6.3 Spread of consonant-induced coarticulation to non-adjacent schwa . 31 2.7 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 Intelligibility of synthetic speech 36 3.1 Introduction to Perceptual Testing . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Hypotheses and sentence design . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Synthesis of the test sentences . . . . . . . . . . . . . . . . . . . . . . . . . . 41 CONTENTS viii 3.3.1 Background to synthesis process . . . . . . . . . . . . . . . . . . . . 41 3.3.2 Incorporating coarticulatory detail in vowels in Set B . . . . . . . . 42 3.3.3 Incorporating coarticulatory detail in schwas in Set B . . . . . . . . 44 3.3.4 Incorporating coarticulatory detail in consonants in Set B . . . . . . 45 3.3.5 Incorporating coarticulatory detail in Sets A and C . . . . . . . . . . 46 3.4 Filler sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5 Adding noise to the speech stimuli . . . . . . . . . . . . . . . . . . . . . . . 47 3.6 Experimental tapes: design considerations . . . . . . . . . . . . . . . . . . . 49 3.6.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.7 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.8 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.9 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.9.1 The r-syllable results . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.10 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.10.1 Speech style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.10.2 Implications for synthesis applications . . . . . . . . . . . . . . . . . 59 4 Temporal course of rhotic resonance effects 62 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.1 Independent variables . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.2 Hypotheses and Questions . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.3 Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Measurements and statistical analysis . . . . . . . . . . . . . . . . . . . . . 69 4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.1 r-colouring in adjacent vowels . . . . . . . . . . . . . . . . . . . . . . 70 4.5.2 Vowel-to-vowel coarticulation . . . . . . . . . . . . . . . . . . . . . . 72 4.5.3 r-colouring in non-adjacent stressed vowels . . . . . . . . . . . . . . 74 4.5.4 r-colouring in non-adjacent unstressed vowels . . . . . . . . . . . . . 75 4.6 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5 Coarticulation after consonant clusters 84 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.5 Measurements and statistical analysis . . . . . . . . . . . . . . . . . . . . . 86 5.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.6.1 Spectral characteristics of vowels after a variety of syllable onsets . . 88 5.6.2 Durational properties of vowels after a variety of syllable onsets . . . 91 5.6.3 Conclusions for vowel data . . . . . . . . . . . . . . . . . . . . . . . 93 5.7 Realization of consonants in clusters . . . . . . . . . . . . . . . . . . . . . . 95 5.7.1 Realization of /r/ in consonant clusters . . . . . . . . . . . . . . . . 95 5.7.2 Realization of alveolars and velars in /Cr/ and /sCr/ sequences . . . 100 CONTENTS ix 5.7.3 Realization of /l/ in consonant clusters . . . . . . . . . . . . . . . . 103 5.7.4 Summary of EPG and acoustic data for consonants . . . . . . . . . . 104 5.8 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6 Metrical influences on liquid coarticulation 107 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2.1 Hypotheses relating to foot-length . . . . . . . . . . . . . . . . . . . 109 6.2.2 Hypotheses relating to syllable’s position in the foot . . . . . . . . . 110 6.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.3.1 Exploring foot-length and syllable compression . . . . . . . . . . . . 110 6.3.2 Exploring the impact of syllable position in the foot . . . . . . . . . 111 6.4 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.5 Measurements and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.6.1 The effect of foot-length on vowel durations . . . . . . . . . . . . . . 115 6.6.2 The effect of foot-length on spectral properties of vowels . . . . . . . 116 6.6.3 The effect of syllable position in the foot on coarticulation . . . . . . 121 6.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Concluding Remarks 125 7.1 Susceptibility to coarticulatory influence . . . . . . . . . . . . . . . . . . . . 126 7.2 Long-domain rhotic resonance effects . . . . . . . . . . . . . . . . . . . . . . 127 7.3 Temporal variation and coarticulatory effects . . . . . . . . . . . . . . . . . 130 7.3.1 Segmental influences on timing . . . . . . . . . . . . . . . . . . . . . 130 7.3.2 Metrical influences on timing . . . . . . . . . . . . . . . . . . . . . . 130 7.3.3 Summary: Temporal variation and coarticulatory effects . . . . . . . 131 7.4 Perceptual salience of coarticulatory detail . . . . . . . . . . . . . . . . . . . 132 7.4.1 Lexical influences on the salience of coarticulatory detail . . . . . . . 132 7.4.2 Perceptual coherence and perceptual testing . . . . . . . . . . . . . . 133 7.4.3 Importance of sensitive and application-oriented perceptual testing . 136 7.5 Afterword: Combined influences on r-colouring . . . . . . . . . . . . . . . . 137 A Pre-synthesis production study 139 A.1 Background to recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 B Acoustic data for isolated vowels 144 B.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 B.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 x List of Figures 1.1 Intelligibility of natural and synthetic speech in noise . . . . . . . . . . . . . 6 2.1 Schematized outline of experiment . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Spectrogram of the utterance the reap . . . . . . . . . . . . . . . . . . . . . 23 2.3 F2 and F3 in vowels in /h/, /l/ and /r/ contexts . . . . . . . . . . . . . . . 25 2.4 F2 and F3 in vowels in /hV/ and /rV/ . . . . . . . . . . . . . . . . . . . . . 26 2.5 F2 and F3 in vowels in /hV/ and /lV/ . . . . . . . . . . . . . . . . . . . . . 28 2.6 F2 and F3 in /h/, /l/ and /r/ . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 F2 in schwa in preceding vowel contexts . . . . . . . . . . . . . . . . . . . . 32 2.8 Schematized spectrograms of /hV/, /lV/ and /rV/ . . . . . . . . . . . . . . 34 2.9 Schematized spectrograms of /ri, rI, rE, ræ/ . . . . . . . . . . . . . . . . . . 35 3.1 Schematized outline of sequence of interest . . . . . . . . . . . . . . . . . . 38 3.2 Schema for design of perceptual experiment . . . . . . . . . . . . . . . . . . 51 3.3 % phonemes correct for rule and edited forms . . . . . . . . . . . . . . . . . 53 3.4 % phonemes correct for rule and edited forms of r-context words . . . . . . 55 3.5 % phonemes correct for rule and edited forms of r-context words for each word set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1 Schematized outline of sequences of interest in long-domain experiment . . 63 4.2 F2, F3 and F4 in vowels in /Vr/, /rV/, /Vh/ and /hV/ . . . . . . . . . . . 71 4.3 F2, F3 and F4 in vowels adjacent to /r/ or /h/ in stressed or unstressed neighbouring contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Schematized spectrograms of vowels adjacent to /r/ or /h/ . . . . . . . . . 74 4.5 F2, F3 and F4 in unstressed vowels in non-adjacent /r/ and /h/ contexts . 75 4.6 F2, F3 and F4 in unstressed /i I @/ in non-adjacent /r/ and /h/ contexts . 77 4.7 F2, F3 and F4 in non-adj. unstr. vowels by stress of influencing syllable . . 78 4.8 F2, F3 and F4 in non-adj. unstr. vowels by direction of influence . . . . . . 79 5.1 F2, F3 and F4 in Hz in vowels after consonant clusters . . . . . . . . . . . . 88 5.2 F2, F3 and F4 in ERBs in vowels after consonant clusters . . . . . . . . . . 90 5.3 Vowel duration after various syllable onsets . . . . . . . . . . . . . . . . . . 92 5.4 Controlled words: vowel durations after various syllable onsets . . . . . . . 94 5.5 Spectrograms of /ri:/, /kri:/ and /skri:/ . . . . . . . . . . . . . . . . . . . . 96
Description: