Connect with us

Jobs & Careers

Google Gemini 2.5 Flash Will Be Processed Locally in India

Published

on


Google announced at its I/O event in Bengaluru that its cost-efficient, low-latency AI model, Gemini 2.5 Flash, will now be processed locally in India. 

This means that the AI model will be processed in data centre facilities within the country, which prevents the need for cross-border transfer of user data. 

“This high-performance model can be accessed via Single Zone Provisioned Throughput within Google Cloud regions in India, ensuring top-tier stability and speed,” said Google in a blog post.

“This will especially help developers build for regulated industries like healthcare, banking and finance, and for the public sector, where data residency and low-latency processing are critical,” the company added. 

While Google’s flagship is the Gemini 2.5 Pro model, the Gemini 2.5 Flash provides an excellent balance of price and performance, according to multiple developers and benchmark tests. Additionally, it is one of the fastest AI models in terms of output speed. 

Similarly, the company announced last year that the Gemini 1.5 flash will be processed locally in India. 

Google operates cloud regions in India to support its cloud services. These facilities manage and process data and AI workloads for both individuals and businesses. 

The company operates two data centre zones in Mumbai and Delhi, and several reports from last year indicated that the company is working on expanding its footprint in Navi Mumbai. 

Developments such as these are in alignment with the forthcoming Digital Personal Data Protection Act of India, which advocates for the localisation of personal user data. 

On the other hand, OpenAI, a major competitor of Google in the AI industry, is reportedly planning to build data centres in India. Recently, the company introduced a data residency program across Asia, enabling organisations in Japan, India, Singapore, and South Korea to keep their data locally while using its services.

“With data residency, eligible API customers and new ChatGPT Enterprise and Edu customers can choose to have customer content stored at rest in supported countries,” said the company.



Source link

Jobs & Careers

Visa Launches MCP Server and Agent Toolkit to Advance Agentic Commerce

Published

on


Visa has expanded its Intelligent Commerce program with the introduction of a Model Context Protocol (MCP) Server and a Visa Acceptance Agent Toolkit, designed to help developers and business users connect AI agents directly to Visa’s network.

The MCP Server allows developers to link AI agents and large language models with Visa Intelligent Commerce APIs, creating a standardised and secure way to integrate payments. “For AI agents and LLMs to interact with Visa’s trusted network, they need a secure, consistent way to communicate with our services,” the company said in its announcement.

According to Visa, the MCP Server eliminates the need for custom-built integrations, accelerates prototype development, and allows agents to dynamically apply Visa APIs to commerce tasks. Early adopters within Visa have already used the technology to streamline generative AI workflows.

The company also announced the pilot of the Visa Acceptance Agent Toolkit, which runs on the MCP Server. It is designed to let both developers and non-technical users complete commerce tasks in plain language without coding.

 “Now available in pilot, the Visa Acceptance Agent Toolkit empowers both developers and business users to put agentic commerce into action — without writing a single line of code,” Visa noted.

Initial use cases include creating invoices and summarising transaction data through natural language commands. For example, a user could request: “Create an invoice for $100 for John Doe, due Friday,” and the agent would process the request through Visa’s Invoice API.

The Toolkit is currently available as a self-hosted package via npm for JavaScript developers, with all actions routed through the MCP Server under Visa’s security and access controls.

Visa said both the MCP Server and Toolkit remain in pilot while the company explores further B2B and B2C applications. “Trust is crucial for enabling AI commerce,” Visa stated, adding that its decades of work with machine learning and datasets position it to support secure, next-generation payments at scale.

The post Visa Launches MCP Server and Agent Toolkit to Advance Agentic Commerce appeared first on Analytics India Magazine.



Source link

Continue Reading

Jobs & Careers

Top 7 Small Language Models

Published

on


Top 7 Small Language Models
Image by Author

 

Introduction

 
Small language models (SLMs) are quickly becoming the practical face of AI. They are getting faster, smarter, and far more efficient, delivering strong results with a fraction of the compute, memory, and energy that large models require.

A growing trend in the AI community is to use large language models (LLMs) to generate synthetic datasets, which are then used to fine-tune SLMs for specific tasks or to adopt particular styles. As a result, SLMs are becoming smarter, faster, and more specialized, all while maintaining a compact size. This opens up exciting possibilities: you can now embed intelligent models directly into systems that don’t require a constant internet connection, enabling on-device intelligence for privacy, speed, and reliability.

In this tutorial, we will review some of the top small language models making waves in the AI world. We will compare their size and performance, helping you understand which models offer the best balance for your needs.

 

1. google/gemma-3-270m-it

 
The Gemma 3 270M model is the smallest and most ultra-lightweight member of the Gemma 3 family, designed for efficiency and accessibility. With just 270 million parameters, it can run smoothly on devices with limited computational resources, making it ideal for experimentation, prototyping, and lightweight applications.

Despite its compact size, the 270M model supports a 32K context window and can handle a wide range of tasks such as basic question answering, summarization, and reasoning.

 

2. Qwen/Qwen3-0.6B

 
The Qwen3-0.6B model is the most lightweight variant in the Qwen3 series, designed to deliver strong performance while remaining highly efficient and accessible. With 600 million parameters (0.44B non-embedding), it strikes a balance between capability and resource requirements.

Qwen3-0.6B comes with the ability to seamlessly switch between “thinking mode” for complex reasoning, math, and coding, and “non-thinking mode” for fast, general-purpose dialogue. It supports a 32K context length and offers multilingual support across 100+ languages.

 

3. HuggingFaceTB/SmolLM3-3B

 
The SmolLM3-3B model is a small yet powerful open-source language model designed to push the limits of small-scale language models. With 3 billion parameters, it delivers strong performance in reasoning, math, coding, and multilingual tasks while remaining efficient enough for broader accessibility.

SmolLM3 supports dual-mode reasoning, allowing users to toggle between extended “thinking mode” for complex problem-solving and a faster, lightweight mode for general dialogue.

Beyond text generation, SmolLM3 also enables agentic usage with tool calling, making it versatile for real-world applications. As a fully open model with public training details, open weights, and checkpoints, SmolLM3 provides researchers and developers with a transparent, high-performance foundation for building reasoning-capable AI systems at the 3B–4B scale.

 

4. Qwen/Qwen3-4B-Instruct-2507

 
The Qwen3-4B-Instruct-2507 model is an updated instruction-tuned variant of the Qwen3-4B series, designed to deliver stronger performance in non-thinking mode. With 4 billion parameters (3.6B non-embedding), it introduces major improvements across instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, while also expanding long-tail knowledge coverage across multiple languages.

Unlike other Qwen3 models, this version is optimized exclusively for non-thinking mode, ensuring faster, more efficient responses without generating reasoning tokens. It also demonstrates better alignment with user preferences, excelling in open-ended and creative tasks such as writing, dialogue, and subjective reasoning.

 

5. google/gemma-3-4b-it

 
The Gemma 3 4b model is an instruction-tuned, multimodal member of the Gemma 3 family, designed to handle both text and image inputs while generating high-quality text outputs. With 4 billion parameters and support for a 128K token context window, it is well-suited for tasks such as question answering, summarization, reasoning, and detailed image understanding.

Importantly, it is highly used for fine-tuning on text classification, image classification, or specialized tasks, which further improves the model’s specialization and performance for certain domains.

 

6. janhq/Jan-v1-4B

 
The Jan-v1 model is the first release in the Jan Family, built specifically for agentic reasoning and problem-solving within the Jan App. Based on the Lucy model and powered by the Qwen3-4B-thinking architecture, Jan-v1 delivers enhanced reasoning capabilities, tool utilization, and improved performance on complex agentic tasks.

By scaling the model and fine-tuning its parameters, it has achieved an impressive accuracy of 91.1% on SimpleQA. This marks a significant milestone in factual question answering for models of this size. It is optimized for local use with the Jan app, vLLM, and llama.cpp, with recommended settings to enhance performance.

 

7. microsoft/Phi-4-mini-instruct

 
The Phi-4-mini-instruct model is a lightweight 3.8B parameter language model from Microsoft’s Phi-4 family, designed for efficient reasoning, instruction following, and safe deployment in both research and commercial applications.

Trained on a mix of 5T tokens from high-quality filtered web data, synthetic “textbook-like” reasoning data, and curated supervised instruction data, it supports a 128K token context length and excels in math, logic, and multilingual tasks.

Phi-4-mini-instruct also supports function calling, multilingual generation (20+ languages), and integration with frameworks like vLLM and Transformers, enabling flexible deployment.

 

Conclusion

 
This article explores a new wave of lightweight yet powerful open models that are reshaping the AI landscape by balancing efficiency, reasoning, and accessibility.

From Google’s Gemma 3 family with the ultra-compact gemma-3-270m-it and the multimodal gemma-3-4b-it, to Qwen’s Qwen3 series with the efficient Qwen3-0.6B and the long-context, instruction-optimized Qwen3-4B-Instruct-2507, these models highlight how scaling and fine-tuning can unlock strong reasoning and multilingual capabilities in smaller footprints.

SmolLM3-3B pushes the boundaries of small models with dual-mode reasoning and long-context support, while Jan-v1-4B focuses on agentic reasoning and tool use within the Jan App ecosystem.

Finally, Microsoft’s Phi-4-mini-instruct demonstrates how 3.8B parameters can deliver competitive performance in math, logic, and multilingual tasks through high-quality synthetic data and alignment techniques.
 
 

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.



Source link

Continue Reading

Jobs & Careers

IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI

Published

on


IBM Cloud will overhaul its Basic Support tier, transitioning from free, human-led case support to a self-service model starting in January 2026, according to emails accessed by The Register. 

Under the current basic support, which is provided at no cost with Pay‑As‑You‑Go or Subscription accounts, customers can “raise cases with IBM’s support team 24×7.” However, no guaranteed response times or dedicated account managers are included. 

According to an email sent to affected customers, this upcoming change means Basic Support users will lose the ability to “open or escalate technical support cases through the portal or APIs.” 

Instead, they will still be able to “self‑report service issues (e.g., hardware or backup failures) via the Cloud Console” and lodge “billing and account cases in the IBM Cloud Support Portal,” the media house reported. 

IBM encourages users to adopt its Watsonx-powered IBM Cloud AI Assistant, which was upgraded earlier this year. The company also plans to introduce a “report an issue” tool in January 2026, promising “faster issue routing.” Additionally, an expanded library of documentation will provide deeper self‑help content.

The internal message reassures customers that “This no‑cost support level will shift to a self‑service model to align with industry standards and improve your support experience.” Still, for those requiring “technical support, faster response times, or severity‑level control,” IBM advises upgrading to a paid support plan, with pricing “starting at $200/month”.

While IBM claims the move brings its support structure in line with industry norms, the article notes that hyperscale cloud providers such as AWS, Google Cloud, and Microsoft Azure already offer similar self‑service tiers, with extra value like community forums, advisor tools, and usage‑based optimisation, without such drastic cuts to human support, as per news reports.

The post IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI appeared first on Analytics India Magazine.



Source link

Continue Reading

Trending