Connect with us

Funding & Business

PepsiCo Boosts Stake in Celsius With $585 Million Deal

Published

on




John Fieldly, Celsius CEO, says energy drinks are the fastest-growing category in the market and the deal with Pepsi’s Rockstar maximizes the value of Celsius’ portfolio. He tells Scarlet Fu and Bailey Lipschultz on “The Close” the company is also focusing on female consumers with a lifestyle brand for today’s health-minded customer. (Source: Bloomberg)



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Funding & Business

Software commands 40% of cybersecurity budgets as gen AI attacks execute in milliseconds

Published

on




Software spending now makes up 40% of cybersecurity budgets, with investment expected to grow as CISOs prioritize real-time AI defenses.Read More



Source link

Continue Reading

Funding & Business

How Sakana AI’s new evolutionary algorithm builds powerful AI models without expensive retraining

Published

on


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


A new evolutionary technique from Japan-based AI lab Sakana AI enables developers to augment the capabilities of AI models without costly training and fine-tuning processes. The technique, called Model Merging of Natural Niches (M2N2), overcomes the limitations of other model merging methods and can even evolve new models entirely from scratch.

M2N2 can be applied to different types of machine learning models, including large language models (LLMs) and text-to-image generators. For enterprises looking to build custom AI solutions, the approach offers a powerful and efficient way to create specialized models by combining the strengths of existing open-source variants.

What is model merging?

Model merging is a technique for integrating the knowledge of multiple specialized AI models into a single, more capable model. Instead of fine-tuning, which refines a single pre-trained model using new data, merging combines the parameters of several models simultaneously. This process can consolidate a wealth of knowledge into one asset without requiring expensive, gradient-based training or access to the original training data.

For enterprise teams, this offers several practical advantages over traditional fine-tuning. In comments to VentureBeat, the paper’s authors said model merging is a gradient-free process that only requires forward passes, making it computationally cheaper than fine-tuning, which involves costly gradient updates. Merging also sidesteps the need for carefully balanced training data and mitigates the risk of “catastrophic forgetting,” where a model loses its original capabilities after learning a new task. The technique is especially powerful when the training data for specialist models isn’t available, as merging only requires the model weights themselves.


AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

  • Turning energy into a strategic advantage
  • Architecting efficient inference for real throughput gains
  • Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO


Early approaches to model merging required significant manual effort, as developers adjusted coefficients through trial and error to find the optimal blend. More recently, evolutionary algorithms have helped automate this process by searching for the optimal combination of parameters. However, a significant manual step remains: developers must set fixed sets for mergeable parameters, such as layers. This restriction limits the search space and can prevent the discovery of more powerful combinations.

How M2N2 works

M2N2 addresses these limitations by drawing inspiration from evolutionary principles in nature. The algorithm has three key features that allow it to explore a wider range of possibilities and discover more effective model combinations.

Model Merging of Natural Niches Source: arXiv

First, M2N2 eliminates fixed merging boundaries, such as blocks or layers. Instead of grouping parameters by pre-defined layers, it uses flexible “split points” and “mixing ration” to divide and combine models. This means that, for example, the algorithm might merge 30% of the parameters in one layer from Model A with 70% of the parameters from the same layer in Model B. The process starts with an “archive” of seed models. At each step, M2N2 selects two models from the archive, determines a mixing ratio and a split point, and merges them. If the resulting model performs well, it is added back to the archive, replacing a weaker one. This allows the algorithm to explore increasingly complex combinations over time. As the researchers note, “This gradual introduction of complexity ensures a wider range of possibilities while maintaining computational tractability.”

Second, M2N2 manages the diversity of its model population through competition. To understand why diversity is crucial, the researchers offer a simple analogy: “Imagine merging two answer sheets for an exam… If both sheets have exactly the same answers, combining them does not make any improvement. But if each sheet has correct answers for different questions, merging them gives a much stronger result.” Model merging works the same way. The challenge, however, is defining what kind of diversity is valuable. Instead of relying on hand-crafted metrics, M2N2 simulates competition for limited resources. This nature-inspired approach naturally rewards models with unique skills, as they can “tap into uncontested resources” and solve problems others can’t. These niche specialists, the authors note, are the most valuable for merging.

Third, M2N2 uses a heuristic called “attraction” to pair models for merging. Rather than simply combining the top-performing models as in other merging algorithms, it pairs them based on their complementary strengths. An “attraction score” identifies pairs where one model performs well on data points that the other finds challenging. This improves both the efficiency of the search and the quality of the final merged model.

M2N2 in action

The researchers tested M2N2 across three different domains, demonstrating its versatility and effectiveness.

The first was a small-scale experiment evolving neural network–based image classifiers from scratch on the MNIST dataset. M2N2 achieved the highest test accuracy by a substantial margin compared to other methods. The results showed that its diversity-preservation mechanism was key, allowing it to maintain an archive of models with complementary strengths that facilitated effective merging while systematically discarding weaker solutions.

Next, they applied M2N2 to LLMs, combining a math specialist model (WizardMath-7B) with an agentic specialist (AgentEvol-7B), both of which are based on the Llama 2 architecture. The goal was to create a single agent that excelled at both math problems (GSM8K dataset) and web-based tasks (WebShop dataset). The resulting model achieved strong performance on both benchmarks, showcasing M2N2’s ability to create powerful, multi-skilled models.

A model merge with M2N2 combines the best of both seed models Source: arXiv

Finally, the team merged diffusion-based image generation models. They combined a model trained on Japanese prompts (JSDXL) with three Stable Diffusion models primarily trained on English prompts. The objective was to create a model that combined the best image generation capabilities of each seed model while retaining the ability to understand Japanese. The merged model not only produced more photorealistic images with better semantic understanding but also developed an emergent bilingual ability. It could generate high-quality images from both English and Japanese prompts, even though it was optimized exclusively using Japanese captions.

For enterprises that have already developed specialist models, the business case for merging is compelling. The authors point to new, hybrid capabilities that would be difficult to achieve otherwise. For example, merging an LLM fine-tuned for persuasive sales pitches with a vision model trained to interpret customer reactions could create a single agent that adapts its pitch in real-time based on live video feedback. This unlocks the combined intelligence of multiple models with the cost and latency of running just one.

Looking ahead, the researchers see techniques like M2N2 as part of a broader trend toward “model fusion.” They envision a future where organizations maintain entire ecosystems of AI models that are continuously evolving and merging to adapt to new challenges.

“Think of it like an evolving ecosystem where capabilities are combined as needed, rather than building one giant monolith from scratch,” the authors suggest.

The researchers have released the code of M2N2 on GitHub.

The biggest hurdle to this dynamic, self-improving AI ecosystem, the authors believe, is not technical but organizational. “In a world with a large ‘merged model’ made up of open-source, commercial, and custom components, ensuring privacy, security, and compliance will be a critical problem.” For businesses, the challenge will be figuring out which models can be safely and effectively absorbed into their evolving AI stack.



Source link
Continue Reading

Funding & Business

A $23 Trillion Cash Pile Holds Key for Chinese Stocks’ Bull Run

Published

on

China’s stock rally is set to get a boost from small investors, stoking hopes that their massive savings will fuel the next leg of the market’s blistering advance.

The benchmark CSI 300 Index has been on a tear, rising 10% in August to be one of the world’s best-performing equity gauges amid a liquidity-driven surge. While hedge funds have been active in the market, analysts say the nation’s mom-and-pop investors are still in the early stages of what could be a major rotation into stocks and equity funds.



Source link

Continue Reading

Trending