Events & Conferences2 years ago
Better foundation models for video representation
Recent foundation models — such as large language models — have achieved state-of-the-art performance by learning to reconstruct randomly masked-out text or images. Without any human...