Link

Teacher Forcing

Autoregressive Task

MLE and Teacher Forcing

\[\begin{gathered} \mathcal{D}=\{x^i,y^i\}_{i=1}^N \\ \begin{aligned} \hat{\theta}&=\underset{\theta\in\Theta}{\text{argmax}}\sum_{i=1}^N{\log{P(y^i|x^i;\theta)}} \\ &=\underset{\theta\in\Theta}{\text{argmax}}\sum_{i=1}^N{\sum_{j=1}^n{\log{P(y_j^i|x^i,y_{<j}^i;\theta)}}}, \end{aligned} \\ \text{where }y^i=y_{1:n}^i=\{y_1^i,\cdots,y_n^i\}. \end{gathered}\]