Processing math: 50%
Link

Autoencoding Models - BERT

Previous Methods

Methodology

Masked Language Model

Next Sentence Prediction

Embedding Combination

Fine-tuning

\[P(|x;θPLM,W)=softmax(hW)=softmax(PLM(x;θPLM)W),where hRhidden_size and WRhidden_size×\#classes.\]

Evaluations

\[D={(xi,si,ei)}Ni=1,where xi is input and si is start index of span with end index of span ei.Lspan(θPLM,S,E)=Ni=1logP(si|xi;θPLM,S)Ni=1logP(ei|xi;θPLM,E)P(si|xi;θPLM,S)=exp(S\]

Wrap-up