Connect with us

Jobs & Careers

7 Popular LLMs Explained in 7 Minutes

Published

on


7 Popular LLMs (GPT, BERT, LLaMA & More)
Image by Author | Canva

 

We use large language models in many of our daily tasks. These models have been trained on billions of online documents and diverse datasets, making them capable of understanding, comprehending, and responding in human-like language. However, not all LLMs are created the same way. While the core idea remains similar, they differ in their underlying architectures and these variations have a significant impact on their capabilities. For example, as seen across various benchmarks, DeepSeek excels at reasoning tasks, Claude performs well in coding, and ChatGPT stands out in creative writing.

In this article, I’ll walk you through 7 popular LLM architectures to give you a clear overview, all in just as many minutes. So, let’s get started.

 

1. BERT

 
Paper Link: https://arxiv.org/pdf/1810.04805
Developed by Google in 2018, BERT marked a significant shift in natural language understanding by introducing deep bidirectional attention in language modeling. Unlike previous models that read text in a left-to-right or right-to-left manner, BERT uses a transformer encoder to consider both directions simultaneously. It is trained using two tasks: masked language modeling (predicting randomly masked words) and next-sentence prediction (determining if one sentence logically follows another). Architecturally, BERT comes in two sizes: BERT Base (12 layers, 110M parameters) and BERT Large (24 layers, 340M parameters). Its structure relies solely on encoder stacks and includes special tokens like [CLS] to represent the full sentence and [SEP] to separate two sentences. You can fine-tune it for tasks like sentiment analysis, question answering (like SQuAD), and more. It was the first of its kind to truly understand the full meaning of sentences.

 

2. GPT

 
Paper Link (GPT 4): https://arxiv.org/pdf/2303.08774
The GPT (Generative Pre-trained Transformer) family was introduced by OpenAI. The series began with GPT-1 in 2018 and has evolved to GPT-4 by 2023, with the latest version, GPT-4o, released in May 2024, showcasing multimodal capabilities, handling both text and images. They are pre-trained on very large text corpora with a standard next-token prediction language modeling objective: at each step the model predicts the next word in a sequence given all previous words. After this unsupervised pre-training stage, the same model can be fine-tuned on specific tasks or used in a zero-/few-shot way with minimal additional parameters. The decoder-only design means GPT attends only to previous tokens unlike BERT’s bidirectional encoder. What was notable at introduction was the sheer scale and capability of GPT: as each successive generation (GPT‑2, GPT‑3) grew larger, the model demonstrated very fluent text generation and few-shot learning abilities, establishing the “pre-train and prompt/fine-tune” paradigm for large language models. However, they are proprietary, with access typically provided via APIs, and their exact architectures, especially for recent versions, are not fully disclosed.

 

3. LLaMA

 
LLaMA 4 Blog Link: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
Paper Link (LLaMA 3) : https://arxiv.org/abs/2407.21783
LLaMA, developed by Meta AI and first released in February 2023, is a family of open-source decoder-only transformer models. It ranges from 7 billion to 70 billion parameters, with the latest version, Llama 4, released in April 2025. Like GPT, LLaMA uses a Transformer decoder-only architecture (each model is an autoregressive Transformer) but with some architectural tweaks. For example, the original LLaMA models used the SwiGLU activation instead of GeLU, rotary positional embeddings (RoPE) instead of fixed ones, and RMSNorm in place of layer norm. The LLaMA family was released in multiple sizes from 7B up to 65B parameters in LLaMA1, later even larger in LLaMA3 to make large-scale models more accessible. Notably, despite relatively modest parameter counts, these models performed competitively with much larger contemporaries: Meta reported that LLaMA’s 13B model outperformed OpenAI’s 175B GPT-3 on many benchmarks, and its 65B model was competitive with contemporaries like Google’s PaLM and DeepMind’s Chinchilla. LLaMA’s open (though research-restricted) release spawned extensive community use; its key novelty was combining efficient training at scale with more open access to model weights.

 

4. PaLM

 
PaLM 2 Technical Report: https://arxiv.org/abs/2305.10403
Paper Link (PaLM): https://arxiv.org/pdf/2204.02311
PaLM (Pathways Language Model) is a series of large language models developed by Google Research. The original PaLM (announced 2022) was a 540-billion parameter, decoder-only Transformer and is part of Google’s Pathways system. It was trained on a high-quality corpus of 780 billion tokens and across thousands of TPU v4 chips in Google’s infrastructure, employing parallelism to achieve high hardware utilization. The model also has multi-query attention to reduce memory bandwidth requirements during inference. PaLM is known for its few-shot learning capabilities, performing well on new tasks with minimal examples because of its huge and diverse training data, which includes webpages, books, Wikipedia, news, GitHub code, and social media conversations. PaLM 2, announced in May 2023, further improved multilingual, reasoning, and coding capabilities, powering applications like Google Bard and Workspace AI features.

 

5. Gemini

 
Gemini 2.5 Blog: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
Paper Link (Gemini 1.5): https://arxiv.org/abs/2403.05530
Paper Link (Gemini): https://arxiv.org/abs/2312.11805
Gemini is Google’s next-generation LLM family (from Google DeepMind and Google Research), introduced in late 2023. Gemini models are natively multimodal, meaning they are designed from the ground up to handle text, images, audio, video, and even code in one model. Like PaLM and GPT, Gemini is based on the Transformer, but its key features include massive scale, support for extremely long contexts, and (in Gemini 1.5) a Mixture-of-Experts (MoE) architecture for efficiency. For example, Gemini 1.5 (“Pro”) uses sparsely activated expert layers (hundreds of expert sub-networks, with only a few active per input) to boost capacity without proportional compute cost. The Gemini 2.5 series, launched in March 2025, built upon this foundation with even deeper “thinking” capabilities. In June 2025, Google released Gemini 2.5 Flash and Pro as stable models and previewed Flash‑Lite, their most cost-efficient, fastest version yet, optimized for high-throughput tasks while still supporting the million-token context window and tool integrations like search and code execution. The Gemini family comes in multiple sizes (Ultra, Pro, Nano) so it can run from cloud servers down to mobile devices. The combination of multimodal pretraining and MoE-based scaling makes Gemini a flexible, highly capable foundation model.

 

6. Mistral

 
Paper Link (Mistral 7B): https://arxiv.org/abs/2310.06825
Mistral is a French AI startup that released its first LLMs in 2023. Its flagship model, Mistral 7B (Sept 2023), is a 7.3 billion-parameter Transformer-based decoder model. Architecturally, Mistral 7B is similar to a GPT-style model but includes optimizations for inference: it uses grouped-query attention (GQA) to speed up self-attention and sliding-window attention to handle longer contexts more efficiently. In terms of performance, Mistral 7B outperformed Meta’s Llama 2 13B and even gave strong results versus 34B models, while being much smaller. Mistral AI released the model under an Apache 2.0 license, making it freely available for use. Its next major release was Mixtral 8×7B, a sparse Mixture-of-Experts (MoE) model featuring eight 7 B-parameter expert networks per layer. This design helped Mixtral match or beat GPT‑3.5 and LLaMA 2 70B on tasks like mathematics, coding, and multilingual benchmarks. In May 2025, Mistral released Mistral Medium 3, a proprietary mid-sized model aimed at enterprises. This model delivers over 90% of the score of pricier models like Claude 3.7 Sonnet on standard benchmarks, while reducing per-token cost dramatically ( approximately \$0.40 in vs \$3.00 for Sonnet). It supports multimodal tasks (text + images), professional reasoning, and is offered through an API or for on-prem deployment on as few as four GPUs. However, unlike earlier models, Medium 3 is closed-source, prompting community criticism that Mistral is moving away from its open-source ethos. Shortly after, in June 2025, Mistral introduced Magistral, their first model dedicated to explicit reasoning. The small version is open under Apache 2.0, while Magistral Medium is enterprise-only. Magistral Medium scored 73.6% on AIME2024, with the small version scoring 70.7%, demonstrating strong math and logic skills in multiple languages.

 

7. DeepSeek

 
Paper Link (DeepSeek-R1): https://arxiv.org/abs/2501.12948
DeepSeek is a Chinese AI company (spin-off of High-Flyer AI, founded 2023) that develops large LLMs. Its recent models (like DeepSeek v3 and DeepSeek-R1) employ a highly sparsely activated Mixture-of-Experts Transformer architecture. In DeepSeek v3/R1, each Transformer layer has hundreds of expert sub-networks, but only a few are activated per token. This means instead of running all parts of the model at once, the model has hundreds of expert networks and activates only a few (like 9 out of 257) depending on what’s needed for each input. This allows DeepSeek to have a huge total model size (over 670 billion parameters) while only using about 37 billion during each response, making it much faster and cheaper to run than a dense model of similar size. Like other modern LMs, it uses SwiGLU activations, rotary embeddings (RoPE), and advanced optimizations (including experimental FP8 precision during training) to make it more efficient. This aggressive MoE design lets DeepSeek achieve very high capability (comparable to much larger dense models) at lower compute cost. DeepSeek’s models (released under open licenses) attracted attention for rivaling leading models like GPT-4 in multilingual generation and reasoning, all while significantly reducing training and inference resource requirements.
 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Jobs & Careers

HCLSoftware Launches Domino 14.5 With Focus on Data Privacy and Sovereign AI

Published

on


HCLSoftware, a global enterprise software leader, launched HCL Domino 14.5 on July 7 as a major upgrade, specifically targeting governments and organisations operating in regulated sectors that are concerned about data privacy and digital independence.

A key feature of the new release is Domino IQ, a sovereign AI extension built into the Domino platform. This new tool gives organisations full control over their AI models and data, helping them comply with regulations such as the European AI Act.

 It also removes dependence on foreign cloud services, making it easier for public sector bodies and banks to protect sensitive information.

“The importance of data sovereignty and avoiding unnecessary foreign government influence extends beyond SaaS solutions and AI. Specifically for collaboration – the sensitive data within email, chat, video recordings and documents. With the launch of Domino+ 14.5, HCLSoftware is helping over 200+ government agencies safeguard their sensitive data,” said Richard Jefts, executive vice president and general manager at HCLSoftware

The updated Domino+ collaboration suite now includes enhanced features for secure messaging, meetings, and file sharing. These tools are ready to deploy and meet the needs of organisations that handle highly confidential data.

The platform is supported by IONOS, a leading European cloud provider. Achim Weiss, CEO of IONOS, added, “Today, more than ever, true digital sovereignty is the key to Europe’s digital future. That’s why at IONOS we are proud to provide the sovereign cloud infrastructure for HCL’s sovereign collaboration solutions.”

Other key updates in Domino 14.5 include achieving BSI certification for information security, the integration of security event and incident management (SEIM) tools to enhance threat detection and response, and full compliance with the European Accessibility Act, ensuring that all web-based user experiences are inclusive and accessible to everyone.

With the launch of Domino 14.5, HCLSoftware is aiming to be a trusted technology partner for public sector and highly regulated organisations seeking control, security, and compliance in their digital operations.



Source link

Continue Reading

Jobs & Careers

Mitsubishi Electric Invests in AI-Assisted PLM Systems Startup ‘Things’

Published

on


Mitsubishi Electric Corporation announced on July 7 that its ME Innovation Fund has invested in Things, a Japan-based startup that develops and provides AI-assisted product lifecycle management (PLM) systems for the manufacturing industry. 

This startup specialises in comprehensive document management, covering everything from product planning and development to disposal. According to the company, this marks the 12th investment made by Mitsubishi’s fund to date.

Through this investment, Mitsubishi Electric aims to combine its extensive manufacturing and control expertise with Things’ generative AI technology. The goal is to accelerate the development of digital transformation (DX) solutions that tackle various challenges facing the manufacturing industry.

In recent years, Japan’s manufacturing sector has encountered several challenges, including labour shortages and the ageing of skilled technicians, which hinder the transfer of expertise. In response, DX initiatives, such as the implementation of PLM and other digital systems, have progressed rapidly. However, these initiatives have faced challenges related to development time, cost, usability, and scalability.

Komi Matsubara, an executive officer at Mitsubishi Electric Corporation, stated, “Through our collaboration with Things, we expect to generate new value by integrating our manufacturing expertise with Things’ generative AI technology. We aim to leverage this initiative to enhance the overall competitiveness of the Mitsubishi Electric group.”

Things launched its ‘PRISM’ PLM system in May 2023, utilising generative AI to improve the structure and usage of information in manufacturing. PRISM offers significant cost and scalability advantages, enhancing user interfaces and experiences while effectively implementing proofs of concept across a wide range of companies.

Atsuya Suzuki, CEO of Things, said, “We are pleased to establish a partnership with Mitsubishi Electric through the ME Innovation Fund. By combining our technology with Mitsubishi Electric’s expertise in manufacturing and control, we aim to accelerate the global implementation of pioneering DX solutions for manufacturing.”



Source link

Continue Reading

Jobs & Careers

AI to Track Facial Expressions to Detect PTSD Symptoms in Children

Published

on


A research team from the University of South Florida (USF) has developed an AI system that can identify post-traumatic stress disorder (PTSD) in children.

The project addresses a longstanding clinical dilemma: diagnosing PTSD in children who may not have the emotional vocabulary, cognitive development or comfort to articulate their distress. Traditional methods such as subjective interviews and self-reported questionnaires often fall short. This is where AI steps in.

“Even when they weren’t saying much, you could see what they were going through on their faces,” Alison Salloum, professor at the USF School of Social Work, reportedly said. Her observations during trauma interviews laid the foundation for collaboration with Shaun Canavan, an expert in facial analysis at USF’s Bellini College of Artificial Intelligence, Cybersecurity, and Computing.

The study introduces a privacy-first, context-aware classification model that analyses subtle facial muscle movements. However, instead of using raw footage, the system extracts non-identifiable metrics such as eye gaze, mouth curvature, and head position, ensuring ethical boundaries are respected when working with vulnerable populations. 

“We don’t use raw video. We completely get rid of subject identification and only keep data about facial movement,” Canavan reportedly emphasised. The AI also accounts for conversational context, whether a child is speaking to a parent or a therapist, which significantly influences emotional expressivity.

Across 18 therapy sessions, with over 100 minutes of footage per child and approximately 185,000 frames each, the AI identified consistent facial expression patterns in children diagnosed with PTSD. Notably, children were more expressive with clinicians than with parents; a finding that aligns with psychological literature suggesting shame or emotional avoidance often inhibits open communication at home.

While still in its early stages, the tool is not being pitched as a replacement for therapists. Instead, it’s designed as a clinical augmentation, a second set of ‘digital’ eyes that can pick up on emotional signals even trained professionals might miss in real time.

“Data like this is incredibly rare for AI systems,” Canavan added. “That’s what makes this so promising. We now have an ethically sound, objective way to support mental health assessments.”

If validated on a larger scale, the system could transform mental health diagnostics for children—especially for pre-verbal or very young patients—by turning non-verbal cues into actionable insights. 



Source link

Continue Reading

Trending