Connect with us

Jobs & Careers

7 Popular LLMs Explained in 7 Minutes

Published

on


7 Popular LLMs (GPT, BERT, LLaMA & More)
Image by Author | Canva

 

We use large language models in many of our daily tasks. These models have been trained on billions of online documents and diverse datasets, making them capable of understanding, comprehending, and responding in human-like language. However, not all LLMs are created the same way. While the core idea remains similar, they differ in their underlying architectures and these variations have a significant impact on their capabilities. For example, as seen across various benchmarks, DeepSeek excels at reasoning tasks, Claude performs well in coding, and ChatGPT stands out in creative writing.

In this article, I’ll walk you through 7 popular LLM architectures to give you a clear overview, all in just as many minutes. So, let’s get started.

 

1. BERT

 
Paper Link: https://arxiv.org/pdf/1810.04805
Developed by Google in 2018, BERT marked a significant shift in natural language understanding by introducing deep bidirectional attention in language modeling. Unlike previous models that read text in a left-to-right or right-to-left manner, BERT uses a transformer encoder to consider both directions simultaneously. It is trained using two tasks: masked language modeling (predicting randomly masked words) and next-sentence prediction (determining if one sentence logically follows another). Architecturally, BERT comes in two sizes: BERT Base (12 layers, 110M parameters) and BERT Large (24 layers, 340M parameters). Its structure relies solely on encoder stacks and includes special tokens like [CLS] to represent the full sentence and [SEP] to separate two sentences. You can fine-tune it for tasks like sentiment analysis, question answering (like SQuAD), and more. It was the first of its kind to truly understand the full meaning of sentences.

 

2. GPT

 
Paper Link (GPT 4): https://arxiv.org/pdf/2303.08774
The GPT (Generative Pre-trained Transformer) family was introduced by OpenAI. The series began with GPT-1 in 2018 and has evolved to GPT-4 by 2023, with the latest version, GPT-4o, released in May 2024, showcasing multimodal capabilities, handling both text and images. They are pre-trained on very large text corpora with a standard next-token prediction language modeling objective: at each step the model predicts the next word in a sequence given all previous words. After this unsupervised pre-training stage, the same model can be fine-tuned on specific tasks or used in a zero-/few-shot way with minimal additional parameters. The decoder-only design means GPT attends only to previous tokens unlike BERT’s bidirectional encoder. What was notable at introduction was the sheer scale and capability of GPT: as each successive generation (GPT‑2, GPT‑3) grew larger, the model demonstrated very fluent text generation and few-shot learning abilities, establishing the “pre-train and prompt/fine-tune” paradigm for large language models. However, they are proprietary, with access typically provided via APIs, and their exact architectures, especially for recent versions, are not fully disclosed.

 

3. LLaMA

 
LLaMA 4 Blog Link: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
Paper Link (LLaMA 3) : https://arxiv.org/abs/2407.21783
LLaMA, developed by Meta AI and first released in February 2023, is a family of open-source decoder-only transformer models. It ranges from 7 billion to 70 billion parameters, with the latest version, Llama 4, released in April 2025. Like GPT, LLaMA uses a Transformer decoder-only architecture (each model is an autoregressive Transformer) but with some architectural tweaks. For example, the original LLaMA models used the SwiGLU activation instead of GeLU, rotary positional embeddings (RoPE) instead of fixed ones, and RMSNorm in place of layer norm. The LLaMA family was released in multiple sizes from 7B up to 65B parameters in LLaMA1, later even larger in LLaMA3 to make large-scale models more accessible. Notably, despite relatively modest parameter counts, these models performed competitively with much larger contemporaries: Meta reported that LLaMA’s 13B model outperformed OpenAI’s 175B GPT-3 on many benchmarks, and its 65B model was competitive with contemporaries like Google’s PaLM and DeepMind’s Chinchilla. LLaMA’s open (though research-restricted) release spawned extensive community use; its key novelty was combining efficient training at scale with more open access to model weights.

 

4. PaLM

 
PaLM 2 Technical Report: https://arxiv.org/abs/2305.10403
Paper Link (PaLM): https://arxiv.org/pdf/2204.02311
PaLM (Pathways Language Model) is a series of large language models developed by Google Research. The original PaLM (announced 2022) was a 540-billion parameter, decoder-only Transformer and is part of Google’s Pathways system. It was trained on a high-quality corpus of 780 billion tokens and across thousands of TPU v4 chips in Google’s infrastructure, employing parallelism to achieve high hardware utilization. The model also has multi-query attention to reduce memory bandwidth requirements during inference. PaLM is known for its few-shot learning capabilities, performing well on new tasks with minimal examples because of its huge and diverse training data, which includes webpages, books, Wikipedia, news, GitHub code, and social media conversations. PaLM 2, announced in May 2023, further improved multilingual, reasoning, and coding capabilities, powering applications like Google Bard and Workspace AI features.

 

5. Gemini

 
Gemini 2.5 Blog: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
Paper Link (Gemini 1.5): https://arxiv.org/abs/2403.05530
Paper Link (Gemini): https://arxiv.org/abs/2312.11805
Gemini is Google’s next-generation LLM family (from Google DeepMind and Google Research), introduced in late 2023. Gemini models are natively multimodal, meaning they are designed from the ground up to handle text, images, audio, video, and even code in one model. Like PaLM and GPT, Gemini is based on the Transformer, but its key features include massive scale, support for extremely long contexts, and (in Gemini 1.5) a Mixture-of-Experts (MoE) architecture for efficiency. For example, Gemini 1.5 (“Pro”) uses sparsely activated expert layers (hundreds of expert sub-networks, with only a few active per input) to boost capacity without proportional compute cost. The Gemini 2.5 series, launched in March 2025, built upon this foundation with even deeper “thinking” capabilities. In June 2025, Google released Gemini 2.5 Flash and Pro as stable models and previewed Flash‑Lite, their most cost-efficient, fastest version yet, optimized for high-throughput tasks while still supporting the million-token context window and tool integrations like search and code execution. The Gemini family comes in multiple sizes (Ultra, Pro, Nano) so it can run from cloud servers down to mobile devices. The combination of multimodal pretraining and MoE-based scaling makes Gemini a flexible, highly capable foundation model.

 

6. Mistral

 
Paper Link (Mistral 7B): https://arxiv.org/abs/2310.06825
Mistral is a French AI startup that released its first LLMs in 2023. Its flagship model, Mistral 7B (Sept 2023), is a 7.3 billion-parameter Transformer-based decoder model. Architecturally, Mistral 7B is similar to a GPT-style model but includes optimizations for inference: it uses grouped-query attention (GQA) to speed up self-attention and sliding-window attention to handle longer contexts more efficiently. In terms of performance, Mistral 7B outperformed Meta’s Llama 2 13B and even gave strong results versus 34B models, while being much smaller. Mistral AI released the model under an Apache 2.0 license, making it freely available for use. Its next major release was Mixtral 8×7B, a sparse Mixture-of-Experts (MoE) model featuring eight 7 B-parameter expert networks per layer. This design helped Mixtral match or beat GPT‑3.5 and LLaMA 2 70B on tasks like mathematics, coding, and multilingual benchmarks. In May 2025, Mistral released Mistral Medium 3, a proprietary mid-sized model aimed at enterprises. This model delivers over 90% of the score of pricier models like Claude 3.7 Sonnet on standard benchmarks, while reducing per-token cost dramatically ( approximately \$0.40 in vs \$3.00 for Sonnet). It supports multimodal tasks (text + images), professional reasoning, and is offered through an API or for on-prem deployment on as few as four GPUs. However, unlike earlier models, Medium 3 is closed-source, prompting community criticism that Mistral is moving away from its open-source ethos. Shortly after, in June 2025, Mistral introduced Magistral, their first model dedicated to explicit reasoning. The small version is open under Apache 2.0, while Magistral Medium is enterprise-only. Magistral Medium scored 73.6% on AIME2024, with the small version scoring 70.7%, demonstrating strong math and logic skills in multiple languages.

 

7. DeepSeek

 
Paper Link (DeepSeek-R1): https://arxiv.org/abs/2501.12948
DeepSeek is a Chinese AI company (spin-off of High-Flyer AI, founded 2023) that develops large LLMs. Its recent models (like DeepSeek v3 and DeepSeek-R1) employ a highly sparsely activated Mixture-of-Experts Transformer architecture. In DeepSeek v3/R1, each Transformer layer has hundreds of expert sub-networks, but only a few are activated per token. This means instead of running all parts of the model at once, the model has hundreds of expert networks and activates only a few (like 9 out of 257) depending on what’s needed for each input. This allows DeepSeek to have a huge total model size (over 670 billion parameters) while only using about 37 billion during each response, making it much faster and cheaper to run than a dense model of similar size. Like other modern LMs, it uses SwiGLU activations, rotary embeddings (RoPE), and advanced optimizations (including experimental FP8 precision during training) to make it more efficient. This aggressive MoE design lets DeepSeek achieve very high capability (comparable to much larger dense models) at lower compute cost. DeepSeek’s models (released under open licenses) attracted attention for rivaling leading models like GPT-4 in multilingual generation and reasoning, all while significantly reducing training and inference resource requirements.
 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Jobs & Careers

‘Reliance Intelligence’ is Here, In Partnership with Google and Meta 

Published

on



Reliance Industries chairman Mukesh Ambani has announced the launch of Reliance Intelligence, a new wholly owned subsidiary focused on artificial intelligence, marking what he described as the company’s “next transformation into a deep-tech enterprise.”

Addressing shareholders, Ambani said Reliance Intelligence had been conceived with four core missions—building gigawatt-scale AI-ready data centres powered by green energy, forging global partnerships to strengthen India’s AI ecosystem, delivering AI services for consumers and SMEs in critical sectors such as education, healthcare, and agriculture, and creating a home for world-class AI talent.

Work has already begun on gigawatt-scale AI data centres in Jamnagar, Ambani said, adding that they would be rolled out in phases in line with India’s growing needs. 

These facilities, powered by Reliance’s new energy ecosystem, will be purpose-built for AI training and inference at a national scale.

Ambani also announced a “deeper, holistic partnership” with Google, aimed at accelerating AI adoption across Reliance businesses. 

“We are marrying Reliance’s proven capability to build world-class assets and execute at India scale with Google’s leading cloud and AI technologies,” Ambani said.

Google CEO Sundar Pichai, in a recorded message, said the two companies would set up a new cloud region in Jamnagar dedicated to Reliance.

“It will bring world-class AI and compute from Google Cloud, powered by clean energy from Reliance and connected by Jio’s advanced network,” Pichai said. 

He added that Google Cloud would remain Reliance’s largest public cloud partner, supporting mission-critical workloads and co-developing advanced AI initiatives.

Ambani further unveiled a new AI-focused joint venture with Meta. 

He said the venture would combine Reliance’s domain expertise across industries with Meta’s open-source AI models and tools to deliver “sovereign, enterprise-ready AI for India.”

Meta founder and CEO Mark Zuckerberg, in his remarks, said the partnership is aimed to bring open-source AI to Indian businesses at scale. 

“With Reliance’s reach and scale, we can bring this to every corner of India. This venture will become a model for how AI, and one day superintelligence, can be delivered,” Zuckerberg said.

Ambani also highlighted Reliance’s investments in AI-powered robotics, particularly humanoid robotics, which he said could transform manufacturing, supply chains and healthcare. 

“Intelligent automation will create new industries, new jobs and new opportunities for India’s youth,” he told shareholders.

Calling AI an opportunity “as large, if not larger” than Reliance’s digital services push a decade ago, Ambani said Reliance Intelligence would work to deliver “AI everywhere and for every Indian.”

“We are building for the next decade with confidence and ambition,” he said, underscoring that the company’s partnerships, green infrastructure and India-first governance approach would be central to this strategy.

The post ‘Reliance Intelligence’ is Here, In Partnership with Google and Meta  appeared first on Analytics India Magazine.



Source link

Continue Reading

Jobs & Careers

Cognizant, Workfabric AI to Train 1,000 Context Engineers

Published

on


Cognizant has announced that it would deploy 1,000 context engineers over the next year to industrialise agentic AI across enterprises.

According to an official release, the company claimed that the move marks a “pivotal investment” in the emerging discipline of context engineering. 

As part of this initiative, Cognizant said it is partnering with Workfabric AI, the company building the context engine for enterprise AI. 

Cognizant’s context engineers will be powered by Workfabric AI’s ContextFabric platform, the statement said, adding that the platform transforms the organisational DNA of enterprises, how their teams work, including their workflows, data, rules, and processes, into actionable context for AI agents.Context engineering is essential to enabling AI a

Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.



Source link

Continue Reading

Jobs & Careers

Mastercard, Infosys Join Hands to Enhance Cross-Border Payments

Published

on



Infosys has announced a partnership with Mastercard to make cross-border payments faster and easier for banks and financial institutions.

The collaboration will give institutions quick access to Mastercard Move, the company’s suite of money transfer services that works across 200 countries and over 150 currencies, reaching more than 95% of the world’s banked population. 

Sajit Vijayakumar, CEO of Infosys Finacle, said, “At Infosys Finacle, we are committed to inspiring better banking by helping customers save, pay, borrow and invest better. This engagement with Mastercard Move brings together the agility of our composable banking platform with Mastercard’s unmatched global money movement capabilities—empowering banks to deliver fast and secure cross-border experiences for every customer segment.”

Integration will be powered by Infosys Finacle, helping banks connect with Mastercard’s system in less time and with fewer resources than traditional methods.

Pratik Khowala, EVP and global head of transfer solutions at Mastercard, said, “Through Mastercard Move’s cutting-edge solutions, we empower individuals and organisations to move money quickly and securely across borders.”

The tie-up also comes at a time when global remittances are on the rise. Meanwhile, Anouska Ladds, executive VP of commercial and new payment flows, Asia Pacific at Mastercard, noted, “Global remittances continue to grow, driven by migration, digitalisation and economic development, especially across Asia, which accounted for nearly half of global inflows in 2024.”

He further said that to meet this demand, Mastercard invests in smart money movement solutions within Mastercard Move while expanding its network of collaborators, such as Infosys, to bring the benefits to a more diverse set of users.

Infosys said the partnership will help banks meet growing consumer demand for faster and safer payments. 
Dennis Gada, EVP and global head of banking and financial services at Infosys, said, “Financial institutions are prioritising advancements in digital payment systems. Consumers gravitate toward institutions that offer fast, secure and seamless transaction experiences. Our collaboration with Mastercard to enable near real-time, cross-border payments is designed to significantly improve the financial experiences of everyday customers.”

The post Mastercard, Infosys Join Hands to Enhance Cross-Border Payments appeared first on Analytics India Magazine.



Source link

Continue Reading

Trending