ebook img

Toward Spontaneous Speech Recognition and Understanding PDF

43 Pages·2003·1.186 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Toward Spontaneous Speech Recognition and Understanding

Toward Spontaneous Speech Recognition and Understanding Sadaoki Furui TokyoInstituteof Technology Departmentof ComputerScience 2-12-1,O-okayama,Meguro-ku,Tokyo,152-8552Japan Tel/Fax:+81-3-5734-3480 [email protected] http://www.furui.cs.titech.ac.jp/ 0205-03 Outline • Fundamentals of automatic speech recognition (cid:127) Acoustic modeling (cid:127) Language modeling (cid:127) Database (corpus) and task evaluation (cid:127) Transcription and dialogue systems (cid:127) Spontaneous speech recognition (cid:127) Speech understanding (cid:127) Speech summarization 1 0205-03 Outline (cid:127) Fundamentals of automatic speech recognition (cid:127) Acoustic modeling (cid:127) Language modeling (cid:127) Database (corpus) and task evaluation (cid:127) Transcription and dialogue systems (cid:127) Spontaneous speech recognition (cid:127) Speech understanding (cid:127) Speech summarization 0010-11 Speech recognition technology Spontaneous natural speech 2-way conversation dialogue Fluent word network transcription e speech spotting systemdriven agent& yl dialogue intelligent st digit messaging king spReeeacdh strings dniaalmineg 2000(cid:1) ea formfill office Sp byvoice dictation Connected speech 1980(cid:1) directory assistance Isolated voice words commands 1990(cid:1) 2 20 200 2000 20000 Unrestricted Vocabularysize(numberofwords) 2 0201-01 Categorization of speech recognition tasks Dialogue Monologue (CategoryI) (CategoryII) Humantohuman Switchboard, Broadcastsnews(Hub4), CallHome(Hub5), lecture,presentation, meetingtask voicemail (CategoryIII) (CategoryIV) Humantomachine ATIS,Communicator, Dictation informationretrieval, reservation Major speech recognition applications (cid:127) Conversational systems for accessing information services – Robust conversationusingwireless handheld/hands-free devices inthereal mobilecomputingenvironment – Multimodal speechrecognitiontechnology (cid:127) Systems for transcribing, understanding and summarizing ubiquitous speech documents such as broadcast news, meetings, lectures, presentations and voicemails 3 0010-12 Mechanism of state-of-the-art speech recognizers Speechinput Acoustic analysis x ...x 1 T Phonemeinventory Globalsearch: P(x1...xT|w1...wk) Maximize Pronunciationlexicon P(x1..P.x(wT1|..w.1w..k.w|xk1)...Px(Tw)1...wk) P(w ...w ) over w1...wk 1 k Languagemodel Recognized wordsequence 0010-13 State-of-the-art algorithms in speech recognition Speechinput LPCor Context-dependent,tied melcepstrum, mixturesub-wordHMMs, timederivatives, Acoustic learningfromspeechdata auditorymodels analysis SBR,MLLR Cepstrum Phonemeinventory subtraction Pronunciationlexicon Globalsearch Frame Languagemodel synchronous, beamsearch, stacksearch, fastmatch, Recognized Bigram,trigram, A*search wordsequence FSN,CFG 4 0205-03 Outline (cid:127) Fundamentals of automatic speech recognition (cid:127) Acoustic modeling (cid:127) Language modeling (cid:127) Database (corpus) and task evaluation (cid:127) Transcription and dialogue systems (cid:127) Spontaneous speech recognition (cid:127) Speech understanding (cid:127) Speech summarization 0104-05 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 80dB Power 0dB 6kHz Spectrum 0kHz Waveform 0ms Pitch 15ms Time Digital sound spectrogram 5 0109-22 Framelength Timewindow Frameperiod Frame … Featurevector Feature vector (short-time spectrum) extraction from speech 0112-11 Spectralfinestructure g o l Short-termspeechspectrum F0(Fundamentalfrequency) f Spectralenvelope F0(Fundamentalfrequency) f g o l Resonances(Formants) Whatwearehearing f Spectral structure of speech 6 0112-12 Logarithmicspectrum Cepstrum Spectralfinestructure g 0 o l Fastperiodicalfunctionof f f IDFT t Concentrationat differentpositions Spectralenvelope g 0 o l Slowperiodicalfunctionof f f t Relationship between logarithmic spectrum and cepstrum 0103-09 10 SpectralenvelopebyLPC SpectralenvelopebyLPCcepstrum 8 de 6 u t pli m a 4 g o L Short-timespectrum 2 SpectralenvelopebyFFTcepstrum 0 0 1 2 3 Frequency[kHz] Comparison of spectral envelopes byLPC,LPCcepstrum, and FFT cepstrummethods 7 0103-10 Parameter(vector)trajectory Instantaneous vector Transitional (velocity)vector (Cepstrum) (Delta-cepstrum) Cepstrum and delta-cepstrum coefficients 0103-11 Speech FFT FFTbased spectrum Melscale triangularfilters Log DCT ∆ Acoustic vector ∆2 MFCC-based front-end processor 8 0104-08 b (x) b (x) b (x) 1 2 3 Output probabilities x x x 0.2 0.4 0.7 0.5 0.6 0.3 1 2 3 Phoneme models 0.3 Feature vectors time Phonemek-1 Phonemek Phonemek+1(cid:1) Structure of phoneme HMMs 0104-07 Words grey whales Phonemes Allophones Allophonemodels Hz) k ( Spectrogram ncy ue q Fre de Speechsignal plitu m A Times(seconds) Units of speech (after J.Makhoul & R. Schwartz) 9 0205-03 Outline (cid:127) Fundamentals of automatic speech recognition (cid:127) Acoustic modeling (cid:127) Language modeling (cid:127) Database (corpus) and task evaluation (cid:127) Transcription and dialogue systems (cid:127) Spontaneous speech recognition (cid:127) Speech understanding (cid:127) Speech summarization 0103-14 Language model is crucial ! (cid:127) Rudolph the red nose reindeer. (cid:127) Rudolph the Red knows rain, dear. (cid:127) Rudolph the Red Nose reigned here. (cid:127) This new display can recognize speech. (cid:127) This nudist play can wreck a nice beach. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.