AI Research

Tencent improves testing creative AI models with new benchmark

Published

9 hours ago

July 9, 2025

Tencent has introduced a new benchmark, ArtifactsBench, that aims to fix current problems with testing creative AI models.

Ever asked an AI to build something like a simple webpage or a chart and received something that works but has a poor user experience? The buttons might be in the wrong place, the colours might clash, or the animations feel clunky. It’s a common problem, and it highlights a huge challenge in the world of AI development: how do you teach a machine to have good taste?

For a long time, we’ve been testing AI models on their ability to write code that is functionally correct. These tests could confirm the code would run, but they were completely “blind to the visual fidelity and interactive integrity that define modern user experiences.”

This is the exact problem ArtifactsBench has been designed to solve. It’s less of a test and more of an automated art critic for AI-generated code

🚀Thrilled to introduce #ArtifactsBench! We’re bridging the visual-interactive gap in code generation evaluation.

Our benchmark uses a novel automated, multimodal pipeline to assess LLMs on 1,825 diverse tasks. An MLLM-as-Judge evaluates visual artifacts, achieving 94.4% ranking… pic.twitter.com/84xClcnNyS

— Hunyuan (@TencentHunyuan) July 9, 2025

Getting it right, like a human would should

So, how does Tencent’s AI benchmark work? First, an AI is given a creative task from a catalogue of over 1,800 challenges, from building data visualisations and web apps to making interactive mini-games.

Once the AI generates the code, ArtifactsBench gets to work. It automatically builds and runs the code in a safe and sandboxed environment.

To see how the application behaves, it captures a series of screenshots over time. This allows it to check for things like animations, state changes after a button click, and other dynamic user feedback.

Finally, it hands over all this evidence – the original request, the AI’s code, and the screenshots – to a Multimodal LLM (MLLM), to act as a judge.

This MLLM judge isn’t just giving a vague opinion and instead uses a detailed, per-task checklist to score the result across ten different metrics. Scoring includes functionality, user experience, and even aesthetic quality. This ensures the scoring is fair, consistent, and thorough.

The big question is, does this automated judge actually have good taste? The results suggest it does.

When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard platform where real humans vote on the best AI creations, they matched up with a 94.4% consistency. This is a massive leap from older automated benchmarks, which only managed around 69.4% consistency.

On top of this, the framework’s judgments showed over 90% agreement with professional human developers.

Tencent evaluates the creativity of top AI models with its new benchmark

When Tencent put more than 30 of the world’s top AI models through their paces, the leaderboard was revealing. While top commercial models from Google (Gemini-2.5-Pro) and Anthropic (Claude 4.0-Sonnet) took the lead, the tests unearthed a fascinating insight.

You might think that an AI specialised in writing code would be the best at these tasks. But the opposite was true. The research found that “the holistic capabilities of generalist models often surpass those of specialized ones.”

A general-purpose model, Qwen-2.5-Instruct, actually beat its more specialised siblings, Qwen-2.5-coder (a code-specific model) and Qwen2.5-VL (a vision-specialised model).

The researchers believe this is because creating a great visual application isn’t just about coding or visual understanding in isolation and requires a blend of skills.

“Robust reasoning, nuanced instruction following, and an implicit sense of design aesthetics,” the researchers highlight as example vital skills. These are the kinds of well-rounded, almost human-like abilities that the best generalist models are beginning to develop.

Tencent hopes its ArtifactsBench benchmark can reliably evaluate these qualities and thus measure future progress in the ability for AI to create things that are not just functional but what users actually want to use.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Source link

Up Next

Nvidia becomes first company to reach $4tn in market value | Technology

Don't Miss

Nvidia becomes first company to be worth $4 trillion

Ryan Daws

Click to comment

AI Research

E-research library with AI tools to assist lawyers | Delhi News

Published

55 minutes ago

July 9, 2025

The Editors

New Delhi: In an attempt to integrate legal work in courts with artificial intelligence, Bar Council of Delhi (BCD) has opened a one-of-its-kind e-research library at the Rouse Avenue courts. Inaugurated on July 5 by law minister Kapil Mishra, the library has various software to assist lawyers in their legal work. With initial funding of Rs 20 lakh, BCD functionaries told TOI that they are also planning the expansion of the library to be accessed from anywhere.Named after former BCD chairman BS Sherawat, the library boasts an integrated system, including the legal research platform SCC Online, the legal research online database Manupatra, and an AI platform, Lucio, along with several e-books on law across 15 desktops.Advocate Neeraj, president of Central Delhi Bar Court Association, told TOI, “The vision behind this initiative is to help law practitioners in their research. Lawyers are the officers of the honourable court who assist the judicial officer to reach a verdict in cases. This library will help lawyers in their legal work. Keeping that in mind, considering a request by our association, BCD provided us with funds and resources.”The library, which runs from 9:30 am to 5:30 pm, aims to develop a mechanism with the help of the evolution of technology to allow access from anywhere in the country. “We are thinking along those lines too. It will be good if a lawyer needs some research on some law point and can access the AI tools from anywhere; she will be able to upgrade herself immediately to assist the court and present her case more efficiently,” added Neeraj.Staffed with one technical person and a superintendent, the facility will incur around Rs 1 lakh per month to remain functional.With pendency in Delhi district courts now running over 15.3 lakh cases, AI tools can help law practitioners as well as the courts. Advocate Vikas Tripathi, vice-president of Central Delhi Court Bar Association, said, “Imagine AI tools which can give you relevant references, cite related judgments, and even prepare a case if provided with proper inputs. The AI tools have immense potential.”In July 2024, ‘Adalat AI’ was inaugurated in Delhi’s district courts. This AI-driven speech recognition software is designed to assist court stenographers in transcribing witness examinations and orders dictated by judges to applications designed to streamline workflow. This tool automates many processes. A judicial officer has to log in, press a few buttons, and speak out their observations, which are automatically transcribed, including the legal language. The order is automatically prepared.The then Delhi High Court Chief Justice, now SC Judge Manmohan, said, “The biggest problem I see judges facing is that there is a large demand for stenographers, but there’s not a large pool available. I think this app will solve that problem to a large extent. It will ensure that a large pool of stenographers will become available for other purposes.” At present, the application is being used in at least eight states, including Kerala, Karnataka, Andhra Pradesh, Delhi, Bihar, Odisha, Haryana and Punjab.

Source link

AI Research

Optimized Artificial Intelligence Responds to Search Preferences Survey

Published

1 hour ago

July 9, 2025

The Editors

83% of survey respondents prefer AI search over traditional Googling. LLMO agency, Optimized Artificial Intelligence, calls it the “new default,” not a trend.

(PRUnderground) July 9th, 2025

A new survey reported by “Innovating with AI Magazine” confirms what forward-looking brands have already begun to suspect: 83% of users say they now prefer AI search tools like ChatGPT, Perplexity, and Claude over traditional Googling.(1) For Optimized Artificial Intelligence, a leading AI optimization agency founded by SEO veteran Damon Burton, this marks not a momentary shift but the dawn of a new default in digital behavior.

“This survey isn’t surprising. It’s validating,” said Burton, Founder of Optimized Artificial Intelligence and President of SEO National. “Consumers are clearly signaling that they no longer want to wade through pages of links. They want direct, synthesized answers, and they’re finding them through AI search platforms. That changes the entire playbook for SEO.”

The “Innovating with AI Magazine” report notes that ChatGPT now sees over 200 million weekly active users and that Google’s market share has dipped below 90% for the first time in nearly a decade. Tools like Microsoft’s Copilot, Claude by Anthropic, and Perplexity AI are redefining how information is retrieved and who gets cited.

Brands Can’t Rely on Legacy Search Alone

Optimized Artificial Intelligence has been at the forefront of large language model optimization (LLMO), a strategic evolution of SEO that prepares content not just for ranking on SERPs but for retrieval, citation, and trust in generative AI tools.

“The reality is, most businesses are still optimizing for a search engine that’s disappearing from user behavior,” said Burton. “Google isn’t dying, but it’s being re-prioritized. If your content isn’t LLM optimized by being structured, cited, and semantically relevant, you’re already losing opportunities.”

OAI’s proprietary approach to LLMO, also called generative engine optimization (GEO), includes:

Entity-first schema structuring
Semantic content clustering for LLM retrieval
Platform-specific tuning for ChatGPT, Gemini, Claude, Copilot, Perplexity, and more
Reputation signal optimization to increase brand inclusion in AI-generated summaries

Why This Matters for the Future of Discovery

The “Innovating with AI Magazine” report also highlights challenges: hallucinations, misinformation, and a lack of third-party visibility. But Burton argues this is precisely why strategy matters now more than ever.

“Hallucinations are a technical challenge, but they’re also a signal. LLMs choose what they cite based on structure, clarity, and trust. If your brand isn’t showing up in AI-generated responses, it’s not because AI search is broken. It’s because your content isn’t optimized for how these models think.”

Call to Action for Forward-Thinking Brands

As Google cannibalizes its own SERPs in favor of AI Overviews and third-party visibility continues to shrink, Burton urges brands to adapt and fast: “This is the end of traditional SEO as we knew it. But it’s the beginning of something better: precision-targeted, AI-friendly optimization that earns trust, not just traffic.”

To learn more about SEO for AI search engines and how to get found and cited across platforms like ChatGPT, Claude, Gemini, Perplexity, and Copilot, visit www.OptimizedArtificialIntelligence.com.

(1) https://innovatingwithai.com/is-ai-search-replacing-traditional-search/

About Optimized Artificial Intelligence

Optimized Artificial Intelligence offers tailored AI solutions designed to enhance business operations and drive growth. Their services include developing custom AI models, automating workflows, and providing data-driven insights to help businesses make informed decisions.

The post Optimized Artificial Intelligence Responds to Search Preferences Survey first appeared on

Original Press Release.

Source link

AI Research

Enterprises will strengthen networks to take on AI, survey finds

Published

2 hours ago

July 9, 2025

Denise Dubie

Private data centers: 29.5%
Traditional public cloud: 35.4%
GPU as a service specialists: 18.5%
Edge compute: 16.6%

“There is little variation from training to inference, but the general pattern is workloads are concentrated a bit in traditional public cloud and then hyperscalers have significant presence in private data centers,” McGillicuddy explained. “There is emerging interest around deploying AI workloads at the corporate edge and edge compute environments as well, which allows them to have workloads residing closer to edge data in the enterprise, which helps them combat latency issues and things like that. The big key takeaway here is that the typical enterprise is going to need to make sure that its data center network is ready to support AI workloads.”

AI networking challenges

The popularity of AI doesn’t remove some of the business and technical concerns that the technology brings to enterprise leaders.

According to the EMA survey, business concerns include security risk (39%), cost/budget (33%), rapid technology evolution (33%), and networking team skills gaps (29%). Respondents also indicated several concerns around both data center networking issues and WAN issues. Concerns related to data center networking included:

Integration between AI network and legacy networks: 43%
Bandwidth demand: 41%
Coordinating traffic flows of synchronized AI workloads: 38%
Latency: 36%

WAN issues respondents shared included:

Complexity of workload distribution across sites: 42%
Latency between workloads and data at WAN edge: 39%
Complexity of traffic prioritization: 36%
Network congestion: 33%

“It’s really not cheap to make your network AI ready,” McGillicuddy stated. “You might need to invest in a lot of new switches and you might need to upgrade your WAN or switch vendors. You might need to make some changes to your underlay around what kind of connectivity your AI traffic is going over.”

Enterprise leaders intend to invest in infrastructure to support their AI workloads and strategies. According to EMA, planned infrastructure investments include high-speed Ethernet (800 GbE) for 75% of respondents, hyperconverged infrastructure for 56% of those polled, and SmartNICs/DPUs for 45% of surveyed network professionals.

Source link