Connect with us

Aditya Anantharaman

Events & Conferences2 years ago

Using teacher knowledge at inference time to enhance student model

Knowledge distillation (KD) is one of the most effective ways to deploy large-scale language models in environments where low latency is essential. KD involves transferring the...

More Posts