Processing math: 100%
Link

Knowledge Distillation

Equation

\[D={(xi,yi)}Ni=1ˆθT,ˆWT=argmaxθTΘ,WTWNi=1logP(yi|xi;θT,WT,τ=1)\] \[P(|xi;θ,W,τ)=softmax(Wf(xi;θ)τ)=softmax(Whiτ).\] \[LKD(θS,WS)=Ni=1cCP(y=c|xi;ˆθT,ˆWT,τ)logP(y=c|xi;θS,WS,τ)ExP(x)[EyP(|x;ˆθT,ˆWT,τ)[logP(y|x;θS,WS,τ)]]\] \[LCE(θS,WS)=Ni=1logP(yi|xi;θS,WS)\] \[L(θS,WS)=(1α)LCE(θS,WS)+αLKD(θS,WS)ˆθS,ˆWS=argminθSΘ,WSWL(θS,WS)\]