Processing math: 100%
Link

Few-shot Learning with Smaller PLM

Pattern Exploting Training (PET)

\[Dtrain={(xi,yi)}Ni=1|Dtrain|=|Ddev|=N\] \[xin={x1,if a single sentence(x1,x2),if a pair of sentence\] \[M:YV, where M is mapping function from class label to word in vocabulary V.xprompt=T(xin), where xprompt contains exactly one [MASK] token.\] \[P(y|xin)=P([MASK]=M(y)|xprompt)=exp(wM(y)h[MASK])yYexp(WM(y)h[MASK])\]

Regression

\[y=vlow×P(M(vlow)|xin)+vhigh×P(M(vhigh)|xin)=vlow×(1P(M(vhigh)|xin))+vhigh×P(M(vhigh)|xin)\] \[P(M(vhigh)|xin)=exp(wM(vhigh)h[MASK])w{M(vlow),M(vhigh)}exp(wwh[MASK])\]

Training with examples as demonstrations

\[T(xin)=˜T(xin,[MASK])\] \[T(xi);˜T(x(1),M(y(2)));;˜T(x|Y|,M(y|Y|))\]