Link

Encoder

Equations

Datatset

\[\begin{gathered} \mathcal{D}=\{x^i,y^i\}_{i=1}^N \\ x^i=\{x^i_1,\cdots,x^i_m\}\text{ and }y^i=\{y_0^i,y_1^i,\cdots,y_n^i\}, \\ \text{where }y_0=\text{<BOS>}\text{ and }y_n=\text{<EOS>}. \end{gathered}\]

What we want

\[\hat{y}_{1:n}=f(x_{1:m}:\theta)\]

Encoder Block

\[\begin{gathered} h_{0,1:m}^\text{enc}=\text{emb}(x_{1:m})+\text{pos}(1,m) \\ \tilde{h}_{i,1:m}^\text{enc}=\text{LayerNorm}(\text{Multihead}_i(Q,K,V)+h_{i-1,1:m}^\text{enc}), \\ \text{where }Q=K=V=h_{i-1,1:m}^\text{enc}. \\ \end{gathered}\] \[\begin{gathered} \text{FFN}(h_{i,t})=\text{ReLU}(h_{i,t}\cdot{W_i^1})\cdot{W}_i^2 \\ \text{where }W_i^1\in\mathbb{R}^{d_\text{model}\times{d_\text{ff}}}\text{ and }W_i^2\in\mathbb{R}^{d_\text{ff}\times{d_\text{model}}}. \\ \\ h_{i,1:m}^\text{enc}=\text{LayerNorm}([\text{FFN}(\tilde{h}_{i,1}^\text{enc});\cdots;\text{FFN}(\tilde{h}_{i,m}^\text{enc})]+\tilde{h}_{i,1:m}^\text{enc}) \end{gathered}\]

Encoder

\[\begin{gathered} h_{\ell_\text{enc},1:m}^\text{enc}=\text{Block}_\text{enc}(h_{\ell_\text{enc}-1,1:m}^\text{enc}) \\ \cdots \\ h_{1,1:m}^\text{enc}=\text{Block}_\text{enc}(h_{0,1:m}^\text{enc}) \\ \end{gathered}\]