AI Research

NVIDIA Research at ICLR — the Next Wave of Multimodal Generative AI

Published

2 months ago

April 24, 2025

Advancing AI requires a full-stack approach, with a powerful foundation of computing infrastructure — including accelerated processors and networking technologies — connected to optimized compilers, algorithms and applications.

NVIDIA Research is innovating across this spectrum, supporting virtually every industry in the process. At this week’s International Conference on Learning Representations (ICLR), taking place April 24-28 in Singapore, more than 70 NVIDIA-authored papers introduce AI developments with applications in autonomous vehicles, healthcare, multimodal content creation, robotics and more.

“ICLR is one of the world’s most impactful AI conferences, where researchers introduce important technical innovations that move every industry forward,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. “The research we’re contributing this year aims to accelerate every level of the computing stack to amplify the impact and utility of AI across industries.”

Research That Tackles Real-World Challenges

Several NVIDIA-authored papers at ICLR cover groundbreaking work in multimodal generative AI and novel methods for AI training and synthetic data generation, including:

Fugatto: The world’s most flexible audio generative AI model, Fugatto generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files. Other NVIDIA models at ICLR improve audio large language models (LLMs) to better understand speech.

HAMSTER: This paper demonstrates that a hierarchical design for vision-language-action models can improve their ability to transfer knowledge from off-domain fine-tuning data — inexpensive data that doesn’t need to be collected on actual robot hardware — to improve a robot’s skills in testing scenarios.

Hymba: This family of small language models uses a hybrid model architecture to create LLMs that blend the benefits of transformer models and state space models, enabling high-resolution recall, efficient context summarization and common-sense reasoning tasks. With its hybrid approach, Hymba improves throughput by 3x and reduces cache by almost 4x without sacrificing performance.

LongVILA: This training pipeline enables efficient visual language model training and inference for long video understanding. Training AI models on long videos is compute and memory-intensive — so this paper introduces a system that efficiently parallelizes long video training and inference, with training scalability up to 2 million tokens on 256 GPUs. LongVILA achieves state-of-the-art performance across nine popular video benchmarks.

LLaMaFlex: This paper introduces a new zero-shot generation technique to create a family of compressed LLMs based on one large model. The researchers found that LLaMaFlex can generate compressed models that are as accurate or better than state-of-the art pruned, flexible and trained-from-scratch models — a capability that could be applied to significantly reduce the cost of training model families compared to techniques like pruning and knowledge distillation.

Proteina: This model can generate diverse and designable protein backbones, the framework that holds a protein together. It uses a transformer model architecture with up to 5x as many parameters as previous models.

SRSA: This framework addresses the challenge of teaching robots new tasks using a preexisting skill library — so instead of learning from scratch, a robot can apply and adapt its existing skills to the new task. By developing a framework to predict which preexisting skill would be most relevant to a new task, the researchers were able to improve zero-shot success rates on unseen tasks by 19%.

STORM: This model can reconstruct dynamic outdoor scenes — like cars driving or trees swaying in the wind — with a precise 3D representation inferred from just a few snapshots. The model, which can reconstruct large-scale outdoor scenes in 200 milliseconds, has potential applications in autonomous vehicle development.

Discover the latest work from NVIDIA Research, a global team of around 400 experts in fields including computer architecture, generative AI, graphics, self-driving cars and robotics.

Source link

Related Topics:

Up Next
Improving brain models with ZAPBench

Don't Miss
Introducing Mobility AI: Advancing urban transportation

Isha Salian

Continue Reading

You may like

Click to comment

Leave a Reply
Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

AI Research

New Research Shows Language Choice Alone Can Guide AI Output Toward Eastern or Western Cultural Outlooks

Published
11 minutes ago
on
July 8, 2025

By
Irfan Ahmad

A new study shows that the language used to prompt AI chatbots can steer them toward different cultural mindsets, even when the question stays the same. Researchers at MIT and Tongji University found that large language models like OpenAI’s GPT and China’s ERNIE change their tone and reasoning depending on whether they’re responding in English or Chinese.

The results indicate that these systems translate language while also reflecting cultural patterns. These patterns appear in how the models provide advice, interpret logic, and handle questions related to social behavior.

Same Question, Different Outlook

The team tested both GPT and ERNIE by running identical tasks in English and Chinese. Across dozens of prompts, they found that when GPT answered in Chinese, it leaned more toward community-driven values and context-based reasoning. In English, its responses tilted toward individualism and sharper logic.

Take social orientation, for instance. In Chinese, GPT was more likely to favor group loyalty and shared goals. In English, it shifted toward personal independence and self-expression. These patterns matched well-documented cultural divides between East and West.

When it came to reasoning, the shift continued. The Chinese version of GPT gave answers that accounted for context, uncertainty, and change over time. It also offered more flexible interpretations, often responding with ranges or multiple options instead of just one answer. In contrast, the English version stuck to direct logic and clearly defined outcomes.

No Nudging Needed

What’s striking is that these shifts occurred without any cultural instructions. The researchers didn’t tell the models to act more “Western” or “Eastern.” They simply changed the input language. That alone was enough to flip the models’ behavior, almost like switching glasses and seeing the world in a new shade.

To check how strong this effect was, the researchers repeated each task more than 100 times. They tweaked prompt formats, varied the examples, and even changed gender pronouns. No matter what they adjusted, the cultural patterns held steady.

Real-World Impact

The study didn’t stop at lab tests. In a separate exercise, GPT was asked to choose between two ad slogans, one that stressed personal benefit, another that highlighted family values. When the prompt came in Chinese, GPT picked the group-centered slogan most of the time. In English, it leaned toward the one focused on the individual.

This might sound small, but it shows how language choice can guide the model’s output in ways that ripple into marketing, decision-making, and even education. People using AI tools in one language may get very different advice than someone asking the same question in another.

Can You Steer It?

The researchers also tested a workaround. They added cultural prompts, telling GPT to imagine itself as a person raised in a specific country. That small nudge helped the model shift its tone, even in English, suggesting that cultural context can be dialed up or down depending on how the prompt is framed.

Why It Matters

The findings concern how language affects the way AI models present information. Differences in response patterns suggest that the input language influences how content is structured and interpreted. As AI tools become more integrated into routine tasks and decision-making processes, language-based variations in output may influence user choices over time.

Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next: Jack Dorsey Builds Offline Messaging App That Uses Bluetooth Instead of the Internet

Source link

Continue Reading

AI Research

Indonesian volcano Mount Lewotobi Laki-laki spews massive ash cloud as it erupts again

Published
30 minutes ago
on
July 8, 2025

By
Kelly Ng

Indonesia’s Mount Lewotobi Laki-laki has begun erupting again – at one point shooting an ash cloud 18km (11mi) into the sky – as residents flee their homes once more.

There have been no reports of casualties since Monday morning, when the volcano on the island of Flores began spewing ash and lava again. Authorities have placed it on the highest alert level since an earlier round of eruptions three weeks ago.

At least 24 flights to and from the neighbouring resort island of Bali were cancelled on Monday, though some flights had resumed by Tuesday morning.

The initial column of hot clouds that rose at 11:05 (03:05 GMT) Monday was the volcano’s highest since November, said geology agency chief Muhammad Wafid.

“An eruption of that size certainly carries a higher potential for danger, including its impact on aviation,” Wafid told The Associated Press.

Monday’s eruption, which was accompanied by a thunderous roar, led authorities to enlarge the exclusion zone to a 7km radius from the central vent. They also warned of potential lahar floods – a type of mud or debris flow of volcanic materials – if heavy rain occurs.

The twin-peaked volcano erupted again at 19:30 on Monday, sending ash clouds and lava up to 13km into the air. It erupted a third time at 05:53 on Tuesday at a reduced intensity.

Videos shared overnight show glowing red lava spurting from the volcano’s peaks as residents get into cars and buses to flee.

More than 4,000 people have been evacuated from the area so far, according to the local disaster management agency.

Residents who have stayed put are facing a shortage of water, food and masks, local authorities say.

“As the eruption continues, with several secondary explosions and ash clouds drifting westward and northward, the affected communities who have not been relocated… require focused emergency response efforts,” say Paulus Sony Sang Tukan, who leads the Pululera village, about 8km from Lewotobi Laki-laki.

“Water is still available, but there’s concern about its cleanliness and whether it has been contaminated, since our entire area was blanketed in thick volcanic ash during yesterday’s [eruptions],” he said.

Indonesia sits on the Pacific “Ring of Fire” where tectonic plates collide, causing frequent volcanic activity as well as earthquakes.

Lewotobi Laki-laki has erupted multiple times this year – no casualties have been reported so far.

However, an eruption last November killed at least ten people and forced thousands to flee.

Laki-Laki, which means “man” in Indonesian, is twinned with the calmer but taller 1,703m named Perempuan, the Indonesian word for “woman”.

Additional reporting by Eliazar Ballo in Kupang.

Source link

Continue Reading

AI Research

What makes a good AI prompt? Here are 4 expert tips

Published
31 minutes ago
on
July 8, 2025

By
Kai Riemer

“And do you work well with AI?”

As tools such as ChatGPT, Copilot and other generative artificial intelligence (AI) systems become part of everyday workflows, more companies are looking for employees who can answer “yes” to this question. In other words, people who can prompt effectively, think with AI, and use it to boost productivity.

In fact, in a growing number of roles, being “AI fluent” is quickly becoming as important as being proficient in office software once was.

But we’ve all had that moment when we’ve asked an AI chatbot a question and received what feels like the most generic, surface level answer. The problem isn’t the AI – you just haven’t given it enough to work with.

Think of it this way. During training, the AI will have “read” virtually everything on the internet. But because it makes predictions, it will give you the most probable, most common response. Without specific guidance, it’s like walking into a restaurant and asking for something good. You’ll likely get the chicken.

Your solution lies in understanding that AI systems excel at adapting to context, but you have to provide it. So how exactly do you do that?

Crafting better prompts

You may have heard the term “prompt engineering”. It might sound like you need to design some kind of technical script to get results.

But today’s chatbots are great at human conversation. The format of your prompt is not that important. The content is.

To get the most out of your AI conversations, it’s important that you convey a few basics about what you want, and how you want it. Our approach follows the acronym CATS – context, angle, task and style.

Context means providing the setting and background information the AI needs. Instead of asking “How do I write a proposal?” try “I’m a nonprofit director writing a grant proposal to a foundation that funds environmental education programs for urban schools”. Upload relevant documents, explain your constraints, and describe your specific situation.

Angle (or attitude) leverages AI’s strength in role-playing and perspective-taking. Rather than getting a neutral response, specify the attitude you want. For example, “Act as a critical peer reviewer and identify weaknesses in my argument” or “Take the perspective of a supportive mentor helping me improve this draft”.

Task is specifically about what you actually want the AI to do. “Help me with my presentation” is vague. But “Give me three ways to make my opening slide more engaging for an audience of small business owners” is actionable.

Style harnesses AI’s ability to adapt to different formats and audiences. Specify whether you want a formal report, a casual email, bullet points for executives, or an explanation suitable for teenagers. Tell the AI what voice you want to use – for example, a formal academic style, technical, engaging or conversational.

In a growing number of roles, being able to use AI is quickly becoming as important as being proficient in office software once was.
Shutterstock

Context is everything

Besides crafting a clear, effective prompt, you can also focus on managing the surrounding information – that is to say on “context engineering”. Context engineering refers to everything that surrounds the prompt.

That means thinking about the environment and information the AI has access to: its memory function, instructions leading up to the task, prior conversation history, documents you upload, or examples of what good output looks like.

You should think about prompting as a conversation. If you’re not happy with the first response, push for more, ask for changes, or provide more clarifying information.

Don’t expect the AI to give a ready-made response. Instead, use it to trigger your own thinking. If you feel the AI has produced a lot of good material but you get stuck, copy the best parts into a fresh session and ask it to summarise and continue from there.

Keeping your wits

A word of caution though. Don’t get seduced by the human-like conversation abilities of these chatbots.

Always retain your professional distance and remind yourself that you are the only thinking part in this relationship. And always make sure to check the accuracy of anything an AI produces – errors are increasingly common.

AI systems are remarkably capable, but they need you – and human intelligence – to bridge the gap between their vast generic knowledge and your particular situation. Give them enough context to work with, and they might surprise you with how helpful they can be.

Source link

Continue Reading

Trending

Funding & Business1 week ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Jobs & Careers7 days ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Mergers & Acquisitions7 days ago

Donald Trump suggests US government review subsidies to Elon Musk’s companies

Funding & Business7 days ago

Rethinking Venture Capital’s Talent Pipeline

Jobs & Careers7 days ago

Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)

Funding & Business4 days ago

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Jobs & Careers7 days ago

Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle

Jobs & Careers7 days ago

Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure

Funding & Business1 week ago

From chatbots to collaborators: How AI agents are reshaping enterprise work

Funding & Business5 days ago

Dust hits $6M ARR helping enterprises build AI agents that actually do stuff instead of just talking

aistoriz.com

NVIDIA Research at ICLR — the Next Wave of Multimodal Generative AI

Research That Tackles Real-World Challenges

You may like

Leave a Reply Cancel reply

Leave a Reply

AI Research

New Research Shows Language Choice Alone Can Guide AI Output Toward Eastern or Western Cultural Outlooks

Same Question, Different Outlook

No Nudging Needed

Real-World Impact

Can You Steer It?

Why It Matters

AI Research

Indonesian volcano Mount Lewotobi Laki-laki spews massive ash cloud as it erupts again

AI Research

What makes a good AI prompt? Here are 4 expert tips

Crafting better prompts

Context is everything

Keeping your wits

Trending

Leave a Reply
Cancel reply