Length & Coverage Penalty
Length Penalty
\[\begin{gathered}
\log{\tilde{P}(\hat{y}_{1:n}|x_{1:m};\theta)}=\log{P(\hat{y}_{1:n}|x_{1:m};\theta)}\times\text{penalty}(n) \\
\text{where }\hat{y}_{1:n}\sim\log{P(\cdot|x_{1:m};\theta)}\text{ and } \\
\text{penalty}(n)=\Big(\frac{1+\beta}{1+n}\Big)^\alpha.
\end{gathered}\]
Coverage Penalty
\[\begin{gathered}
\log{\tilde{P}(\hat{y}_{1:n}|x_{1:m};\theta)}=\log{P(\hat{y}_{1:n}|x_{1:m};\theta)}\times\text{penalty}_\text{length}(n)+\text{penalty}_\text{coverage}(x_{1:m},\hat{y}_{1:n}) \\
\\
\text{penalty}_\text{coverage}(x_{1:m},\hat{y}_{1:n})=\beta\times{
\sum_{i=1}^m{
\log{\big(
\min(
\sum_{j=1}^n{
w_{i,j}
}, 1.0
)
\big)}
}
}, \\
\text{where }w_{i,j}=\text{softmax}(h_j^\text{dec}\cdot{W_\text{a}}\cdot{h_i^\text{enc}}^T).
\end{gathered}\]