Jobs & Careers

MongoDB Launches New Voyage AI Embedding and Reranking Models with MCP Server

Published

3 weeks ago

August 12, 2025

MongoDB today announced product updates and an expanded partner ecosystem at Ai4, its annual AI-focused conference to help customers build AI applications at scale.

The company introduced new Voyage AI embedding and reranking models, launched the MongoDB Model Context Protocol (MCP) Server in public preview, and added new partners to its AI ecosystem.

The latest Voyage AI models include voyage-context-3, which processes full document context for more relevant retrieval results, and voyage-3.5 and voyage-3.5-lite, which aim to improve retrieval quality and price-performance.

MongoDB also launched rerank-2.5 and rerank-2.5-lite, allowing developers to guide the reranking process using instructions.

“Databases are more central than ever to the technology stack in the age of AI,” said Andrew Davidson, SVP of products at MongoDB. “By consolidating the AI data stack and building a cutting-edge AI ecosystem, we’re giving developers the tools they need to build and deploy trustworthy AI solutions faster.”

The MCP Server enables direct connections between MongoDB and tools such as GitHub Copilot, Anthropic’s Claude, Cursor, and Windsurf, allowing developers to manage database operations using natural language.

Since its preview launch, thousands of developers have adopted MCP, with growing enterprise interest for agentic application stacks.

“Many organisations struggle to scale AI because the models themselves aren’t up to the task,” Fred Roma, SVP of engineering at MongoDB, said. “The quality of your embedding and reranking models is often the difference between a promising prototype and an AI application that delivers meaningful results in production.”

MongoDB has expanded its AI partner ecosystem with three new additions.

Galileo provides AI reliability and observability through continuous evaluations and monitoring of applications. Temporal enables the orchestration of resilient AI workflows with durable execution and horizontal scaling. LangChain offers integrations such as GraphRAG with MongoDB Atlas and natural language querying capabilities.

MongoDB reported that in the last 18 months, enterprise adopters such as Vonage, LGU+, and The Financial Times, along with approximately 8,000 startups, including Laurel and Mercor, have used its platform for AI projects. Over 200,000 new developers register for MongoDB Atlas monthly.

Source link

Jobs & Careers

Top 7 Small Language Models

Published

58 minutes ago

September 4, 2025

Abid Ali Awan

Image by Author

# Introduction

Small language models (SLMs) are quickly becoming the practical face of AI. They are getting faster, smarter, and far more efficient, delivering strong results with a fraction of the compute, memory, and energy that large models require.

A growing trend in the AI community is to use large language models (LLMs) to generate synthetic datasets, which are then used to fine-tune SLMs for specific tasks or to adopt particular styles. As a result, SLMs are becoming smarter, faster, and more specialized, all while maintaining a compact size. This opens up exciting possibilities: you can now embed intelligent models directly into systems that don’t require a constant internet connection, enabling on-device intelligence for privacy, speed, and reliability.

In this tutorial, we will review some of the top small language models making waves in the AI world. We will compare their size and performance, helping you understand which models offer the best balance for your needs.

# 1. google/gemma-3-270m-it

The Gemma 3 270M model is the smallest and most ultra-lightweight member of the Gemma 3 family, designed for efficiency and accessibility. With just 270 million parameters, it can run smoothly on devices with limited computational resources, making it ideal for experimentation, prototyping, and lightweight applications.

Despite its compact size, the 270M model supports a 32K context window and can handle a wide range of tasks such as basic question answering, summarization, and reasoning.

# 2. Qwen/Qwen3-0.6B

The Qwen3-0.6B model is the most lightweight variant in the Qwen3 series, designed to deliver strong performance while remaining highly efficient and accessible. With 600 million parameters (0.44B non-embedding), it strikes a balance between capability and resource requirements.

Qwen3-0.6B comes with the ability to seamlessly switch between “thinking mode” for complex reasoning, math, and coding, and “non-thinking mode” for fast, general-purpose dialogue. It supports a 32K context length and offers multilingual support across 100+ languages.

# 3. HuggingFaceTB/SmolLM3-3B

The SmolLM3-3B model is a small yet powerful open-source language model designed to push the limits of small-scale language models. With 3 billion parameters, it delivers strong performance in reasoning, math, coding, and multilingual tasks while remaining efficient enough for broader accessibility.

SmolLM3 supports dual-mode reasoning, allowing users to toggle between extended “thinking mode” for complex problem-solving and a faster, lightweight mode for general dialogue.

Beyond text generation, SmolLM3 also enables agentic usage with tool calling, making it versatile for real-world applications. As a fully open model with public training details, open weights, and checkpoints, SmolLM3 provides researchers and developers with a transparent, high-performance foundation for building reasoning-capable AI systems at the 3B–4B scale.

# 4. Qwen/Qwen3-4B-Instruct-2507

The Qwen3-4B-Instruct-2507 model is an updated instruction-tuned variant of the Qwen3-4B series, designed to deliver stronger performance in non-thinking mode. With 4 billion parameters (3.6B non-embedding), it introduces major improvements across instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, while also expanding long-tail knowledge coverage across multiple languages.

Unlike other Qwen3 models, this version is optimized exclusively for non-thinking mode, ensuring faster, more efficient responses without generating reasoning tokens. It also demonstrates better alignment with user preferences, excelling in open-ended and creative tasks such as writing, dialogue, and subjective reasoning.

# 5. google/gemma-3-4b-it

The Gemma 3 4b model is an instruction-tuned, multimodal member of the Gemma 3 family, designed to handle both text and image inputs while generating high-quality text outputs. With 4 billion parameters and support for a 128K token context window, it is well-suited for tasks such as question answering, summarization, reasoning, and detailed image understanding.

Importantly, it is highly used for fine-tuning on text classification, image classification, or specialized tasks, which further improves the model’s specialization and performance for certain domains.

# 6. janhq/Jan-v1-4B

The Jan-v1 model is the first release in the Jan Family, built specifically for agentic reasoning and problem-solving within the Jan App. Based on the Lucy model and powered by the Qwen3-4B-thinking architecture, Jan-v1 delivers enhanced reasoning capabilities, tool utilization, and improved performance on complex agentic tasks.

By scaling the model and fine-tuning its parameters, it has achieved an impressive accuracy of 91.1% on SimpleQA. This marks a significant milestone in factual question answering for models of this size. It is optimized for local use with the Jan app, vLLM, and llama.cpp, with recommended settings to enhance performance.

# 7. microsoft/Phi-4-mini-instruct

The Phi-4-mini-instruct model is a lightweight 3.8B parameter language model from Microsoft’s Phi-4 family, designed for efficient reasoning, instruction following, and safe deployment in both research and commercial applications.

Trained on a mix of 5T tokens from high-quality filtered web data, synthetic “textbook-like” reasoning data, and curated supervised instruction data, it supports a 128K token context length and excels in math, logic, and multilingual tasks.

Phi-4-mini-instruct also supports function calling, multilingual generation (20+ languages), and integration with frameworks like vLLM and Transformers, enabling flexible deployment.

# Conclusion

This article explores a new wave of lightweight yet powerful open models that are reshaping the AI landscape by balancing efficiency, reasoning, and accessibility.

From Google’s Gemma 3 family with the ultra-compact gemma-3-270m-it and the multimodal gemma-3-4b-it, to Qwen’s Qwen3 series with the efficient Qwen3-0.6B and the long-context, instruction-optimized Qwen3-4B-Instruct-2507, these models highlight how scaling and fine-tuning can unlock strong reasoning and multilingual capabilities in smaller footprints.

SmolLM3-3B pushes the boundaries of small models with dual-mode reasoning and long-context support, while Jan-v1-4B focuses on agentic reasoning and tool use within the Jan App ecosystem.

Finally, Microsoft’s Phi-4-mini-instruct demonstrates how 3.8B parameters can deliver competitive performance in math, logic, and multilingual tasks through high-quality synthetic data and alignment techniques.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Source link

Jobs & Careers

IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI

Published

2 hours ago

September 4, 2025

Smruthi Nadig

IBM Cloud will overhaul its Basic Support tier, transitioning from free, human-led case support to a self-service model starting in January 2026, according to emails accessed by The Register.

Under the current basic support, which is provided at no cost with Pay‑As‑You‑Go or Subscription accounts, customers can “raise cases with IBM’s support team 24×7.” However, no guaranteed response times or dedicated account managers are included.

According to an email sent to affected customers, this upcoming change means Basic Support users will lose the ability to “open or escalate technical support cases through the portal or APIs.”

Instead, they will still be able to “self‑report service issues (e.g., hardware or backup failures) via the Cloud Console” and lodge “billing and account cases in the IBM Cloud Support Portal,” the media house reported.

IBM encourages users to adopt its Watsonx-powered IBM Cloud AI Assistant, which was upgraded earlier this year. The company also plans to introduce a “report an issue” tool in January 2026, promising “faster issue routing.” Additionally, an expanded library of documentation will provide deeper self‑help content.

The internal message reassures customers that “This no‑cost support level will shift to a self‑service model to align with industry standards and improve your support experience.” Still, for those requiring “technical support, faster response times, or severity‑level control,” IBM advises upgrading to a paid support plan, with pricing “starting at $200/month”.

While IBM claims the move brings its support structure in line with industry norms, the article notes that hyperscale cloud providers such as AWS, Google Cloud, and Microsoft Azure already offer similar self‑service tiers, with extra value like community forums, advisor tools, and usage‑based optimisation, without such drastic cuts to human support, as per news reports.

The post IBM Cloud to Eliminate Free Human Support and Pivot to Self-Service and AI appeared first on Analytics India Magazine.

Source link

Jobs & Careers

Neo4j Launches Infinigraph for 100TB+ Unified Graph Workloads

Published

2 hours ago

September 4, 2025

Ankush Das

Neo4j has launched Infinigraph, a new distributed graph database architecture designed to run both transactional and analytical workloads in one system at 100TB+ scale.

The platform enables enterprises to store and analyse billions of relationships and run thousands of concurrent queries in real-time. It supports use cases such as embedding tens of millions of documents as vectors for context-aware assistants, global fraud detection, large product catalogues, and compliance analysis.

By merging operational and analytical workloads, Infinigraph addresses the long-standing challenge of data silos. Enterprises often run separate transactional and analytical systems, leading to cost overheads and delays. Neo4j claims its approach removes ETL pipelines, sync delays, and redundancy.

“Infinigraph sets a new standard for enterprise graph databases: one system that runs real-time operations and deep analytics together, at full fidelity and massive scale,” Sudhir Hasbe, president of technology at Neo4j, said in the press release.

Customers such as Intuit and Dun & Bradstreet are already exploring its potential. Chad Cloes, staff software engineer at Intuit, said in the statement that the company needs to scale without compromising performance, adding that Infinigraph could help meet those demands.

The company claims that the system introduces sharding to distribute property data across cluster members while maintaining the graph’s logical integrity. Key benefits include horizontal scaling beyond 100TB, embedding billions of vectors, high availability through autonomous clustering, and cost flexibility with separate billing for compute and storage.

Infinigraph is available in Neo4j’s Enterprise Edition and will be rolled out soon to AuraDB, its cloud-native platform.

The post Neo4j Launches Infinigraph for 100TB+ Unified Graph Workloads appeared first on Analytics India Magazine.

Source link