Connect with us

Wenyi Wu

Events & Conferences1 year ago

Vision-language models that can handle multi-image inputs

Vision-language models, which map images and text to a common representational space, have demonstrated remarkable performance on a wide range of multimodal AI tasks. But they’re...

More Posts