Connect with us

Jobs & Careers

7 Popular LLMs Explained in 7 Minutes

Published

on


7 Popular LLMs (GPT, BERT, LLaMA & More)
Image by Author | Canva

 

We use large language models in many of our daily tasks. These models have been trained on billions of online documents and diverse datasets, making them capable of understanding, comprehending, and responding in human-like language. However, not all LLMs are created the same way. While the core idea remains similar, they differ in their underlying architectures and these variations have a significant impact on their capabilities. For example, as seen across various benchmarks, DeepSeek excels at reasoning tasks, Claude performs well in coding, and ChatGPT stands out in creative writing.

In this article, I’ll walk you through 7 popular LLM architectures to give you a clear overview, all in just as many minutes. So, let’s get started.

 

1. BERT

 
Paper Link: https://arxiv.org/pdf/1810.04805
Developed by Google in 2018, BERT marked a significant shift in natural language understanding by introducing deep bidirectional attention in language modeling. Unlike previous models that read text in a left-to-right or right-to-left manner, BERT uses a transformer encoder to consider both directions simultaneously. It is trained using two tasks: masked language modeling (predicting randomly masked words) and next-sentence prediction (determining if one sentence logically follows another). Architecturally, BERT comes in two sizes: BERT Base (12 layers, 110M parameters) and BERT Large (24 layers, 340M parameters). Its structure relies solely on encoder stacks and includes special tokens like [CLS] to represent the full sentence and [SEP] to separate two sentences. You can fine-tune it for tasks like sentiment analysis, question answering (like SQuAD), and more. It was the first of its kind to truly understand the full meaning of sentences.

 

2. GPT

 
Paper Link (GPT 4): https://arxiv.org/pdf/2303.08774
The GPT (Generative Pre-trained Transformer) family was introduced by OpenAI. The series began with GPT-1 in 2018 and has evolved to GPT-4 by 2023, with the latest version, GPT-4o, released in May 2024, showcasing multimodal capabilities, handling both text and images. They are pre-trained on very large text corpora with a standard next-token prediction language modeling objective: at each step the model predicts the next word in a sequence given all previous words. After this unsupervised pre-training stage, the same model can be fine-tuned on specific tasks or used in a zero-/few-shot way with minimal additional parameters. The decoder-only design means GPT attends only to previous tokens unlike BERT’s bidirectional encoder. What was notable at introduction was the sheer scale and capability of GPT: as each successive generation (GPT‑2, GPT‑3) grew larger, the model demonstrated very fluent text generation and few-shot learning abilities, establishing the “pre-train and prompt/fine-tune” paradigm for large language models. However, they are proprietary, with access typically provided via APIs, and their exact architectures, especially for recent versions, are not fully disclosed.

 

3. LLaMA

 
LLaMA 4 Blog Link: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
Paper Link (LLaMA 3) : https://arxiv.org/abs/2407.21783
LLaMA, developed by Meta AI and first released in February 2023, is a family of open-source decoder-only transformer models. It ranges from 7 billion to 70 billion parameters, with the latest version, Llama 4, released in April 2025. Like GPT, LLaMA uses a Transformer decoder-only architecture (each model is an autoregressive Transformer) but with some architectural tweaks. For example, the original LLaMA models used the SwiGLU activation instead of GeLU, rotary positional embeddings (RoPE) instead of fixed ones, and RMSNorm in place of layer norm. The LLaMA family was released in multiple sizes from 7B up to 65B parameters in LLaMA1, later even larger in LLaMA3 to make large-scale models more accessible. Notably, despite relatively modest parameter counts, these models performed competitively with much larger contemporaries: Meta reported that LLaMA’s 13B model outperformed OpenAI’s 175B GPT-3 on many benchmarks, and its 65B model was competitive with contemporaries like Google’s PaLM and DeepMind’s Chinchilla. LLaMA’s open (though research-restricted) release spawned extensive community use; its key novelty was combining efficient training at scale with more open access to model weights.

 

4. PaLM

 
PaLM 2 Technical Report: https://arxiv.org/abs/2305.10403
Paper Link (PaLM): https://arxiv.org/pdf/2204.02311
PaLM (Pathways Language Model) is a series of large language models developed by Google Research. The original PaLM (announced 2022) was a 540-billion parameter, decoder-only Transformer and is part of Google’s Pathways system. It was trained on a high-quality corpus of 780 billion tokens and across thousands of TPU v4 chips in Google’s infrastructure, employing parallelism to achieve high hardware utilization. The model also has multi-query attention to reduce memory bandwidth requirements during inference. PaLM is known for its few-shot learning capabilities, performing well on new tasks with minimal examples because of its huge and diverse training data, which includes webpages, books, Wikipedia, news, GitHub code, and social media conversations. PaLM 2, announced in May 2023, further improved multilingual, reasoning, and coding capabilities, powering applications like Google Bard and Workspace AI features.

 

5. Gemini

 
Gemini 2.5 Blog: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
Paper Link (Gemini 1.5): https://arxiv.org/abs/2403.05530
Paper Link (Gemini): https://arxiv.org/abs/2312.11805
Gemini is Google’s next-generation LLM family (from Google DeepMind and Google Research), introduced in late 2023. Gemini models are natively multimodal, meaning they are designed from the ground up to handle text, images, audio, video, and even code in one model. Like PaLM and GPT, Gemini is based on the Transformer, but its key features include massive scale, support for extremely long contexts, and (in Gemini 1.5) a Mixture-of-Experts (MoE) architecture for efficiency. For example, Gemini 1.5 (“Pro”) uses sparsely activated expert layers (hundreds of expert sub-networks, with only a few active per input) to boost capacity without proportional compute cost. The Gemini 2.5 series, launched in March 2025, built upon this foundation with even deeper “thinking” capabilities. In June 2025, Google released Gemini 2.5 Flash and Pro as stable models and previewed Flash‑Lite, their most cost-efficient, fastest version yet, optimized for high-throughput tasks while still supporting the million-token context window and tool integrations like search and code execution. The Gemini family comes in multiple sizes (Ultra, Pro, Nano) so it can run from cloud servers down to mobile devices. The combination of multimodal pretraining and MoE-based scaling makes Gemini a flexible, highly capable foundation model.

 

6. Mistral

 
Paper Link (Mistral 7B): https://arxiv.org/abs/2310.06825
Mistral is a French AI startup that released its first LLMs in 2023. Its flagship model, Mistral 7B (Sept 2023), is a 7.3 billion-parameter Transformer-based decoder model. Architecturally, Mistral 7B is similar to a GPT-style model but includes optimizations for inference: it uses grouped-query attention (GQA) to speed up self-attention and sliding-window attention to handle longer contexts more efficiently. In terms of performance, Mistral 7B outperformed Meta’s Llama 2 13B and even gave strong results versus 34B models, while being much smaller. Mistral AI released the model under an Apache 2.0 license, making it freely available for use. Its next major release was Mixtral 8×7B, a sparse Mixture-of-Experts (MoE) model featuring eight 7 B-parameter expert networks per layer. This design helped Mixtral match or beat GPT‑3.5 and LLaMA 2 70B on tasks like mathematics, coding, and multilingual benchmarks. In May 2025, Mistral released Mistral Medium 3, a proprietary mid-sized model aimed at enterprises. This model delivers over 90% of the score of pricier models like Claude 3.7 Sonnet on standard benchmarks, while reducing per-token cost dramatically ( approximately \$0.40 in vs \$3.00 for Sonnet). It supports multimodal tasks (text + images), professional reasoning, and is offered through an API or for on-prem deployment on as few as four GPUs. However, unlike earlier models, Medium 3 is closed-source, prompting community criticism that Mistral is moving away from its open-source ethos. Shortly after, in June 2025, Mistral introduced Magistral, their first model dedicated to explicit reasoning. The small version is open under Apache 2.0, while Magistral Medium is enterprise-only. Magistral Medium scored 73.6% on AIME2024, with the small version scoring 70.7%, demonstrating strong math and logic skills in multiple languages.

 

7. DeepSeek

 
Paper Link (DeepSeek-R1): https://arxiv.org/abs/2501.12948
DeepSeek is a Chinese AI company (spin-off of High-Flyer AI, founded 2023) that develops large LLMs. Its recent models (like DeepSeek v3 and DeepSeek-R1) employ a highly sparsely activated Mixture-of-Experts Transformer architecture. In DeepSeek v3/R1, each Transformer layer has hundreds of expert sub-networks, but only a few are activated per token. This means instead of running all parts of the model at once, the model has hundreds of expert networks and activates only a few (like 9 out of 257) depending on what’s needed for each input. This allows DeepSeek to have a huge total model size (over 670 billion parameters) while only using about 37 billion during each response, making it much faster and cheaper to run than a dense model of similar size. Like other modern LMs, it uses SwiGLU activations, rotary embeddings (RoPE), and advanced optimizations (including experimental FP8 precision during training) to make it more efficient. This aggressive MoE design lets DeepSeek achieve very high capability (comparable to much larger dense models) at lower compute cost. DeepSeek’s models (released under open licenses) attracted attention for rivaling leading models like GPT-4 in multilingual generation and reasoning, all while significantly reducing training and inference resource requirements.
 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Jobs & Careers

AI to Track Facial Expressions to Detect PTSD Symptoms in Children

Published

on


A research team from the University of South Florida (USF) has developed an AI system that can identify post-traumatic stress disorder (PTSD) in children.

The project addresses a longstanding clinical dilemma: diagnosing PTSD in children who may not have the emotional vocabulary, cognitive development or comfort to articulate their distress. Traditional methods such as subjective interviews and self-reported questionnaires often fall short. This is where AI steps in.

“Even when they weren’t saying much, you could see what they were going through on their faces,” Alison Salloum, professor at the USF School of Social Work, reportedly said. Her observations during trauma interviews laid the foundation for collaboration with Shaun Canavan, an expert in facial analysis at USF’s Bellini College of Artificial Intelligence, Cybersecurity, and Computing.

The study introduces a privacy-first, context-aware classification model that analyses subtle facial muscle movements. However, instead of using raw footage, the system extracts non-identifiable metrics such as eye gaze, mouth curvature, and head position, ensuring ethical boundaries are respected when working with vulnerable populations. 

“We don’t use raw video. We completely get rid of subject identification and only keep data about facial movement,” Canavan reportedly emphasised. The AI also accounts for conversational context, whether a child is speaking to a parent or a therapist, which significantly influences emotional expressivity.

Across 18 therapy sessions, with over 100 minutes of footage per child and approximately 185,000 frames each, the AI identified consistent facial expression patterns in children diagnosed with PTSD. Notably, children were more expressive with clinicians than with parents; a finding that aligns with psychological literature suggesting shame or emotional avoidance often inhibits open communication at home.

While still in its early stages, the tool is not being pitched as a replacement for therapists. Instead, it’s designed as a clinical augmentation, a second set of ‘digital’ eyes that can pick up on emotional signals even trained professionals might miss in real time.

“Data like this is incredibly rare for AI systems,” Canavan added. “That’s what makes this so promising. We now have an ethically sound, objective way to support mental health assessments.”

If validated on a larger scale, the system could transform mental health diagnostics for children—especially for pre-verbal or very young patients—by turning non-verbal cues into actionable insights. 



Source link

Continue Reading

Jobs & Careers

Canva Partners With NCERT to Launch AI-Powered Teacher Training

Published

on


Canva has signed a memorandum of understanding (MoU) with the National Council of Educational Research and Training (NCERT) to launch free teacher training and certification programs hosted on the education ministry’s DIKSHA platform. 

The initiative aims to enhance digital literacy, creativity, and AI proficiency among educators across India, in alignment with the objectives of the National Education Policy (NEP) 2020.

As part of the agreement, Canva will offer Indian teachers free access to its education platform and provide learning materials tailored for visual and collaborative instruction. NCERT will ensure that the course content aligns with the national curriculum and is made regionally accessible. Available in multiple Indian languages, the course will also be broadcast via PM e-Vidya DTH channels to extend its reach beyond internet-enabled classrooms.

The certification program includes training on using Canva’s design tools to create engaging lesson plans, infographics, and presentations. Teachers will also learn to co-create content with students and apply AI tools to improve classroom outcomes. Upon completion, participants will receive a joint certificate from NCERT and Canva.

“This partnership is a powerful step toward equipping educators with practical digital skills that not only save time but spark imagination in every classroom,” Jason Wilmot, head of education at Canva, said in a press statement.

Chandrika Deb, country manager for India at Canva stated, “By delivering this program free of cost, in multiple languages, and through a trusted national platform like NCERT, we are not only advancing digital fluency and creative confidence in classrooms across the country, but also deepening Canva’s long-term commitment to India, which plays a pivotal role in our vision to democratize design and creativity at scale.”

Moreover, the company shared some interesting figures. Canva has seen significant global momentum, with over 100 million students and teachers using its platform. In 2024, over 1 billion designs were created, many powered by Canva’s AI tools like Dream Lab, which enables teachers to generate custom visuals instantly. Teacher usage of AI tools has increased by 50% over the past year, with student engagement rising by 107%.

We may see further developments in this partnership as the training program for teachers progresses over time.



Source link

Continue Reading

Jobs & Careers

Capgemini to Acquire WNS for $3.3 Billion with Focus on Agentic AI

Published

on


Capgemini has announced a definitive agreement to acquire WNS, a mid-sized Indian IT firm, for $3.3 billion in cash. This marks a significant step towards establishing a global leadership position in agentic AI.

The deal, unanimously approved by the boards of both companies, values WNS at $76.50 per share—a premium of 28% over the 90-day average and 17% above the July 3 closing price.

The acquisition is expected to immediately boost Capgemini’s revenue growth and operating margin, with normalised EPS accretion of 4% by 2026, increasing to 7% post-synergies in 2027.

“Enterprises are rapidly adopting generative AI and agentic AI to transform their operations end-to-end. Business process services (BPS) will be the showcase for agentic AI,” Aiman Ezzat, CEO of Capgemini, said. 

“Capgemini’s acquisition of WNS will provide the group with the scale and vertical sector expertise to capture that rapidly emerging strategic opportunity created by the paradigm shift from traditional BPS to agentic AI-powered intelligent operations.”

Pending regulatory approvals, the transaction is expected to close by the end of 2025.

WNS’ integration is expected to strengthen Capgemini’s presence in the US market while unlocking immediate cross-selling opportunities through its combined offerings and clientele. 

WNS, which reported $1.27 billion in revenue for FY25 with an 18.7% operating margin, has consistently delivered a revenue growth of around 9% over the past three fiscal years.

“As a recognised leader in the digital BPS space, we see the next wave of transformation being driven by intelligent, domain-centric operations that unlock strategic value for our clients,” Keshav R Murugesh, CEO of WNS, said. “Organisations that have already digitised are now seeking to reimagine their operating models by embedding AI at the core—shifting from automation to autonomy.”

The companies expect to drive additional revenue synergies between €100 million and €140 million, with cost synergies of up to €70 million annually by the end of 2027. 

“WNS and Capgemini share a bold, future-focused vision for Intelligent Operations. I’m confident that Capgemini is the ideal partner at the right time in WNS’ journey,” Timothy L Main, chairman of WNS’ board of directors, said.

Capgemini, already a major player with over €900 million in GenAI bookings in 2024 and strategic partnerships with Microsoft, Google, AWS, Mistral AI, and NVIDIA, aims to solidify its position as a transformation partner for businesses looking to embed agentic AI at scale.



Source link

Continue Reading

Trending