Perplexity and BLEU
Perplexity
\[\begin{aligned} \text{PPL}(w_{1:n})&=P(w_{1:n})^{-\frac{1}{n}} \\ &=\sqrt[n]{\prod_{i=1}^n\frac{1}{P(w_i|w_{<i})}} \end{aligned}\] \[\log\text{PPL}(w_{1:n})\approx-\frac{1}{n}\sum_{i=1}^n{\log{P(w_i|w_{<i};\theta)}}\]Minimizing PPL during the training is same as minimizing likelihood.
Motivation
PPL shows likelihood, but likelihood is currupted by teacher forcing.