Jobs & Careers

Top 7 Small Language Models

Published

6 hours ago

September 4, 2025

Abid Ali Awan

Image by Author

# Introduction

Small language models (SLMs) are quickly becoming the practical face of AI. They are getting faster, smarter, and far more efficient, delivering strong results with a fraction of the compute, memory, and energy that large models require.

A growing trend in the AI community is to use large language models (LLMs) to generate synthetic datasets, which are then used to fine-tune SLMs for specific tasks or to adopt particular styles. As a result, SLMs are becoming smarter, faster, and more specialized, all while maintaining a compact size. This opens up exciting possibilities: you can now embed intelligent models directly into systems that don’t require a constant internet connection, enabling on-device intelligence for privacy, speed, and reliability.

In this tutorial, we will review some of the top small language models making waves in the AI world. We will compare their size and performance, helping you understand which models offer the best balance for your needs.

# 1. google/gemma-3-270m-it

The Gemma 3 270M model is the smallest and most ultra-lightweight member of the Gemma 3 family, designed for efficiency and accessibility. With just 270 million parameters, it can run smoothly on devices with limited computational resources, making it ideal for experimentation, prototyping, and lightweight applications.

Despite its compact size, the 270M model supports a 32K context window and can handle a wide range of tasks such as basic question answering, summarization, and reasoning.

# 2. Qwen/Qwen3-0.6B

The Qwen3-0.6B model is the most lightweight variant in the Qwen3 series, designed to deliver strong performance while remaining highly efficient and accessible. With 600 million parameters (0.44B non-embedding), it strikes a balance between capability and resource requirements.

Qwen3-0.6B comes with the ability to seamlessly switch between “thinking mode” for complex reasoning, math, and coding, and “non-thinking mode” for fast, general-purpose dialogue. It supports a 32K context length and offers multilingual support across 100+ languages.

# 3. HuggingFaceTB/SmolLM3-3B

The SmolLM3-3B model is a small yet powerful open-source language model designed to push the limits of small-scale language models. With 3 billion parameters, it delivers strong performance in reasoning, math, coding, and multilingual tasks while remaining efficient enough for broader accessibility.

SmolLM3 supports dual-mode reasoning, allowing users to toggle between extended “thinking mode” for complex problem-solving and a faster, lightweight mode for general dialogue.

Beyond text generation, SmolLM3 also enables agentic usage with tool calling, making it versatile for real-world applications. As a fully open model with public training details, open weights, and checkpoints, SmolLM3 provides researchers and developers with a transparent, high-performance foundation for building reasoning-capable AI systems at the 3B–4B scale.

# 4. Qwen/Qwen3-4B-Instruct-2507

The Qwen3-4B-Instruct-2507 model is an updated instruction-tuned variant of the Qwen3-4B series, designed to deliver stronger performance in non-thinking mode. With 4 billion parameters (3.6B non-embedding), it introduces major improvements across instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, while also expanding long-tail knowledge coverage across multiple languages.

Unlike other Qwen3 models, this version is optimized exclusively for non-thinking mode, ensuring faster, more efficient responses without generating reasoning tokens. It also demonstrates better alignment with user preferences, excelling in open-ended and creative tasks such as writing, dialogue, and subjective reasoning.

# 5. google/gemma-3-4b-it

The Gemma 3 4b model is an instruction-tuned, multimodal member of the Gemma 3 family, designed to handle both text and image inputs while generating high-quality text outputs. With 4 billion parameters and support for a 128K token context window, it is well-suited for tasks such as question answering, summarization, reasoning, and detailed image understanding.

Importantly, it is highly used for fine-tuning on text classification, image classification, or specialized tasks, which further improves the model’s specialization and performance for certain domains.

# 6. janhq/Jan-v1-4B

The Jan-v1 model is the first release in the Jan Family, built specifically for agentic reasoning and problem-solving within the Jan App. Based on the Lucy model and powered by the Qwen3-4B-thinking architecture, Jan-v1 delivers enhanced reasoning capabilities, tool utilization, and improved performance on complex agentic tasks.

By scaling the model and fine-tuning its parameters, it has achieved an impressive accuracy of 91.1% on SimpleQA. This marks a significant milestone in factual question answering for models of this size. It is optimized for local use with the Jan app, vLLM, and llama.cpp, with recommended settings to enhance performance.

# 7. microsoft/Phi-4-mini-instruct

The Phi-4-mini-instruct model is a lightweight 3.8B parameter language model from Microsoft’s Phi-4 family, designed for efficient reasoning, instruction following, and safe deployment in both research and commercial applications.

Trained on a mix of 5T tokens from high-quality filtered web data, synthetic “textbook-like” reasoning data, and curated supervised instruction data, it supports a 128K token context length and excels in math, logic, and multilingual tasks.

Phi-4-mini-instruct also supports function calling, multilingual generation (20+ languages), and integration with frameworks like vLLM and Transformers, enabling flexible deployment.

# Conclusion

This article explores a new wave of lightweight yet powerful open models that are reshaping the AI landscape by balancing efficiency, reasoning, and accessibility.

From Google’s Gemma 3 family with the ultra-compact gemma-3-270m-it and the multimodal gemma-3-4b-it, to Qwen’s Qwen3 series with the efficient Qwen3-0.6B and the long-context, instruction-optimized Qwen3-4B-Instruct-2507, these models highlight how scaling and fine-tuning can unlock strong reasoning and multilingual capabilities in smaller footprints.

SmolLM3-3B pushes the boundaries of small models with dual-mode reasoning and long-context support, while Jan-v1-4B focuses on agentic reasoning and tool use within the Jan App ecosystem.

Finally, Microsoft’s Phi-4-mini-instruct demonstrates how 3.8B parameters can deliver competitive performance in math, logic, and multilingual tasks through high-quality synthetic data and alignment techniques.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Source link

Jobs & Careers

Visa Launches MCP Server and Agent Toolkit to Advance Agentic Commerce

Published

5 hours ago

September 4, 2025

Siddharth Jindal

Visa has expanded its Intelligent Commerce program with the introduction of a Model Context Protocol (MCP) Server and a Visa Acceptance Agent Toolkit, designed to help developers and business users connect AI agents directly to Visa’s network.

The MCP Server allows developers to link AI agents and large language models with Visa Intelligent Commerce APIs, creating a standardised and secure way to integrate payments. “For AI agents and LLMs to interact with Visa’s trusted network, they need a secure, consistent way to communicate with our services,” the company said in its announcement.

According to Visa, the MCP Server eliminates the need for custom-built integrations, accelerates prototype development, and allows agents to dynamically apply Visa APIs to commerce tasks. Early adopters within Visa have already used the technology to streamline generative AI workflows.

The company also announced the pilot of the Visa Acceptance Agent Toolkit, which runs on the MCP Server. It is designed to let both developers and non-technical users complete commerce tasks in plain language without coding.

“Now available in pilot, the Visa Acceptance Agent Toolkit empowers both developers and business users to put agentic commerce into action — without writing a single line of code,” Visa noted.

Initial use cases include creating invoices and summarising transaction data through natural language commands. For example, a user could request: “Create an invoice for $100 for John Doe, due Friday,” and the agent would process the request through Visa’s Invoice API.

The Toolkit is currently available as a self-hosted package via npm for JavaScript developers, with all actions routed through the MCP Server under Visa’s security and access controls.

Visa said both the MCP Server and Toolkit remain in pilot while the company explores further B2B and B2C applications. “Trust is crucial for enabling AI commerce,” Visa stated, adding that its decades of work with machine learning and datasets position it to support secure, next-generation payments at scale.

The post Visa Launches MCP Server and Agent Toolkit to Advance Agentic Commerce appeared first on Analytics India Magazine.

Source link

Jobs & Careers

IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI

Published

7 hours ago

September 4, 2025

Smruthi Nadig

IBM Cloud will overhaul its Basic Support tier, transitioning from free, human-led case support to a self-service model starting in January 2026, according to emails accessed by The Register.

Under the current basic support, which is provided at no cost with Pay‑As‑You‑Go or Subscription accounts, customers can “raise cases with IBM’s support team 24×7.” However, no guaranteed response times or dedicated account managers are included.

According to an email sent to affected customers, this upcoming change means Basic Support users will lose the ability to “open or escalate technical support cases through the portal or APIs.”

Instead, they will still be able to “self‑report service issues (e.g., hardware or backup failures) via the Cloud Console” and lodge “billing and account cases in the IBM Cloud Support Portal,” the media house reported.

IBM encourages users to adopt its Watsonx-powered IBM Cloud AI Assistant, which was upgraded earlier this year. The company also plans to introduce a “report an issue” tool in January 2026, promising “faster issue routing.” Additionally, an expanded library of documentation will provide deeper self‑help content.

The internal message reassures customers that “This no‑cost support level will shift to a self‑service model to align with industry standards and improve your support experience.” Still, for those requiring “technical support, faster response times, or severity‑level control,” IBM advises upgrading to a paid support plan, with pricing “starting at $200/month”.

While IBM claims the move brings its support structure in line with industry norms, the article notes that hyperscale cloud providers such as AWS, Google Cloud, and Microsoft Azure already offer similar self‑service tiers, with extra value like community forums, advisor tools, and usage‑based optimisation, without such drastic cuts to human support, as per news reports.

The post IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI appeared first on Analytics India Magazine.

Source link

Jobs & Careers

Neo4j Launches Infinigraph for 100TB+ Unified Graph Workloads

Published

7 hours ago

September 4, 2025

Ankush Das

Neo4j has launched Infinigraph, a new distributed graph database architecture designed to run both transactional and analytical workloads in one system at 100TB+ scale.

The platform enables enterprises to store and analyse billions of relationships and run thousands of concurrent queries in real-time. It supports use cases such as embedding tens of millions of documents as vectors for context-aware assistants, global fraud detection, large product catalogues, and compliance analysis.

By merging operational and analytical workloads, Infinigraph addresses the long-standing challenge of data silos. Enterprises often run separate transactional and analytical systems, leading to cost overheads and delays. Neo4j claims its approach removes ETL pipelines, sync delays, and redundancy.

“Infinigraph sets a new standard for enterprise graph databases: one system that runs real-time operations and deep analytics together, at full fidelity and massive scale,” Sudhir Hasbe, president of technology at Neo4j, said in the press release.

Customers such as Intuit and Dun & Bradstreet are already exploring its potential. Chad Cloes, staff software engineer at Intuit, said in the statement that the company needs to scale without compromising performance, adding that Infinigraph could help meet those demands.

The company claims that the system introduces sharding to distribute property data across cluster members while maintaining the graph’s logical integrity. Key benefits include horizontal scaling beyond 100TB, embedding billions of vectors, high availability through autonomous clustering, and cost flexibility with separate billing for compute and storage.

Infinigraph is available in Neo4j’s Enterprise Edition and will be rolled out soon to AuraDB, its cloud-native platform.

The post Neo4j Launches Infinigraph for 100TB+ Unified Graph Workloads appeared first on Analytics India Magazine.

Source link