Events & Conferences2 years ago
Using teacher knowledge at inference time to enhance student model
Knowledge distillation (KD) is one of the most effective ways to deploy large-scale language models in environments where low latency is essential. KD involves transferring the...