Estimating Querying Decoding Language Model Algorithms Kenneth Heafield CarnegieMellon,UniversityofEdinburgh March 7, 2013 KennethHeafield LanguageModelAlgorithms Much of the computational cost is due to the language model. Estimating Querying Decoding MT is Expensive (cid:73) “Since decoding is very time-intensive” [Jehl et al, 2012] (cid:73) “Based on the amount of memory we can afford” [Wuebker et al, 2012] KennethHeafield LanguageModelAlgorithms Estimating Querying Decoding MT is Expensive (cid:73) “Since decoding is very time-intensive” [Jehl et al, 2012] (cid:73) “Based on the amount of memory we can afford” [Wuebker et al, 2012] Much of the computational cost is due to the language model. KennethHeafield LanguageModelAlgorithms Estimating Querying Decoding Desiderata (cid:73) More data (cid:73) Less CPU time (cid:73) Less RAM (cid:73) High Quality (cid:73) Search Accuracy (cid:73) BLEU KennethHeafield LanguageModelAlgorithms Estimating Querying Decoding Language Model Algorithms Estimating Text → ARPA file with probabilities. Querying ARPA file → efficient data structure. Decoding Searching for high-scoring translations. KennethHeafield LanguageModelAlgorithms IRSTLM does not estimate modified Kneser-Ney models. Also, segfaults. Estimating Sorting Querying ModifiedKneser-Ney Decoding Estimating: The Problem SRILM uses too much memory =⇒ limit on data size. Also, annoying to compile. KennethHeafield LanguageModelAlgorithms Estimating Sorting Querying ModifiedKneser-Ney Decoding Estimating: The Problem SRILM uses too much memory =⇒ limit on data size. Also, annoying to compile. IRSTLM does not estimate modified Kneser-Ney models. Also, segfaults. KennethHeafield LanguageModelAlgorithms count(wn) 1 (cid:40) count(wn) 1 if count(w ) > 0 s(w |wn−1) = count(wn−1) n n 1 0.4s(w1|wn−1) if count(w ) = 0 n 2 n NB: s does not sum to 1. Estimating Sorting Querying ModifiedKneser-Ney Decoding Stupid Backoff [Brants et al, 2007] 1. Count n-grams offline 2. Compute pseudo-probabilities at runtime KennethHeafield LanguageModelAlgorithms Estimating Sorting Querying ModifiedKneser-Ney Decoding Stupid Backoff [Brants et al, 2007] 1. Count n-grams offline count(wn) 1 2. Compute pseudo-probabilities at runtime (cid:40) count(wn) 1 if count(w ) > 0 s(w |wn−1) = count(wn−1) n n 1 0.4s(w1|wn−1) if count(w ) = 0 n 2 n NB: s does not sum to 1. KennethHeafield LanguageModelAlgorithms Runs out of RAM. Estimating Sorting Querying ModifiedKneser-Ney Decoding Counting n-grams <s> Australia is one of the few 5-gram Count <s> Australia is one of 1 Australia is one of the 1 is one of the few 1 Hash table? KennethHeafield LanguageModelAlgorithms
Description: