\[Dtrain={(xi,yi)}Ni=1|Dtrain|=|Ddev|=N\]
\[xin={x1,if a single sentence(x1,x2),if a pair of sentence\]
\[M:Y→V, where M is mapping function from class label to word in vocabulary V.xprompt=T(xin), where xprompt contains exactly one [MASK] token.\]
\[P(y|xin)=P([MASK]=M(y)|xprompt)=exp(wM(y)⋅h[MASK])∑y′∈Yexp(WM(y′)⋅h[MASK])\]