Autoregressive Models - GPT
Motivations
Overview
\[\begin{gathered}
\mathcal{D}=\{x_i\}_{i=1}^N, \\
\text{where }x_i=\{w_{i,1},\cdots,w_{i,n}\}.
\end{gathered}\]
\[\begin{gathered}
\mathcal{L}(\theta_\text{PLM})=-\sum_{i=1}^N{
\sum_{t=1}^n{
\log{P(w_{i,t}|w_{i,<t};\theta_\text{PLM})}
}
} \\
\\
\hat{\theta}_\text{PLM}=\underset{\theta_\text{PLM}\in\Theta}{\text{argmin }}\mathcal{L}(\theta_\text{PLM})
\end{gathered}\]
Fine-tuning
Evaluations
Wrap-up