Jobs & Careers

PhysicsWallah Launches Aryabhata 1.0 AI Model to Help Students with JEE Main Maths

Published

1 month ago

July 22, 2025

PhysicsWallah (PW) has launched Aryabhata 1.0, a compact language model focused on mathematics education for competitive exams in India, beginning with JEE Main. The model is part of PW’s broader initiative to build Small Language Models (SLMs) tailored to the real-world needs of Indian learners.

Named after the ancient Indian mathematician Aryabhata, the model draws inspiration from India’s long tradition of mathematical innovation.

Aryabhata 1.0 was initially trained to tackle JEE Main Mathematics and scored 86% in the January 2025 session and 90.2% in the April session. This was achieved despite it being a relatively small model with 7 billion parameters and trained on a single H100 GPU.

“Sometimes, the right support at the right moment can change a student’s entire path,” said Prateek Maheshwari, co-founder of PW and chair of the Indian Edtech Consortium. “At PW, we believe AI can offer that support, if it’s built with care, context, and purpose.”

The model is powered by 130,000 curated question-answer pairs and uses a combination of supervised fine-tuning (SFT), reinforcement learning (RL), and rejection sampling to improve output quality. PW says this approach grounds the model’s reasoning in pedagogical patterns that align with how students are taught for competitive exams.

Maheshwari noted that Aryabhata 1.0 is part of a larger roadmap. “We’re working to expand Aryabhata 1.0 to cover JEE Advanced and a broader range of mathematical domains.”

PW is inviting educators, developers, and researchers to test the model and provide feedback to help refine future iterations. The goal is to eventually build a suite of subject-centric SLMs that support learning across disciplines in an AI-driven education system.

In other news, Google’s Gemini 2.5 Pro recently scored 336.2 out of 360 marks, outperforming the student who topped the exam with 332 marks in 2025.

Source link

Jobs & Careers

What is Data Science in Simple Words?

Published

2 hours ago

September 2, 2025

Iván Palomares Carrascosa

Image by Editor | ChatGPT

# Introduction

“Data science”, “data scientist”, “data-driven systems and processes”, and so on…

Data is everywhere and has become a key element in every industry and business, as well as in our very lives. But with so many data-related terms and buzzwords, it is easy to get lost and lose track of what exactly each one means, especially one of the broadest concepts: data science. This article is intended to explain in simple terms what data science is (and what it isn’t), the knowledge areas it involves, common data science processes in the real world, and their impact.

# What is Data Science?

Data science is best described as a blended discipline that combines multiple knowledge areas (explained shortly). Its primary focus is on using and leveraging data to reveal patterns, answer questions, and support decisions — three critical aspects needed in virtually every business and organization today.

Take a retail firm, for instance: data science can help them find out best-selling products at certain seasons (patterns), explain why certain customers are leaving for competitors (questions), and how much inventory to stock for next winter (decisions). Since data is the core asset in any data science process, it is important to identify the relevant data sources. In this retail example, these sources could include purchase histories, customer behaviors and purchases, and sales numbers over time.

Data science example applied to the retail sector | Image generated by OpenAI and partly modified by the Author

So, what are the three key areas that, when blended together, form the scope of data science?

Math and statistics, to analyze, measure, and understand the main properties of the data
Computer science, to manage and process large datasets efficiently and effectively through software implementations of mathematical and statistical methods
Domain knowledge, to ease the “real-world translation” of processes applied, understand requirements, and apply insights gained to the specific application domain: business, health, sports, etc.

Data science is a blended discipline that combines multiple knowledge areas.

# Real World Scope, Processes, and Impact

With so many related areas, like data analysis, data visualization, analytics, and even artificial intelligence (AI), it is important to demystify what data science isn’t. Data science is not limited to collecting, storing, and managing data in databases or performing shallow analyses, nor is it a magic wand that provides answers without domain knowledge and context. It is neither the same as artificial intelligence nor its most data-related subdomain: machine learning.

While AI and machine learning focus on building systems that mimic intelligence by learning from data, data science encompasses the comprehensive process of gathering, cleaning, exploring, and interpreting data to draw insights and guide decision-making. Thus, in simple terms, the essence of data science processes is to deeply analyze and understand data to connect it to the real-world problem at hand.

These activities are often framed as part of a data science lifecycle: a structured, cyclical workflow that typically moves from understanding the business problem to collecting and preparing data, analyzing and modeling it, and finally deploying and monitoring solutions. This ensures that data-driven projects remain practical, aligned with real needs, and continuously improved.

Data science impacts real-world processes in businesses and organizations in several ways:

Revealing patterns in complex datasets, for instance, customer behavior and preferences over products
Improving operational and strategic decision-making with insights driven from data, to optimize processes, reduce costs, etc.
Predicting trends or events, e.g., future demand (the use of machine learning techniques as part of data science processes is common for this purpose)
Personalizing user experience through products, content, and services, and adapting them to their preferences or needs

To broaden the picture, here are a couple of other domain examples:

Healthcare: Predicting patient readmission rates, identifying disease outbreaks from public health data, or aiding drug discovery through the analysis of genetic sequences
Finance: Detecting fraudulent credit card transactions in real time or building models to assess loan risk and creditworthiness

# Clarifying Related Roles

Beginners often find it confusing to distinguish between the many roles in the data space. While data science is broad, here’s a simple breakdown of some of the most common roles you’ll encounter:

Data Analyst: Focuses on describing the past and present, often through reports, dashboards, and descriptive statistics to answer business questions
Data Scientist: Works on prediction and inference, often building models and running experiments to forecast future outcomes and uncover hidden insights
Machine Learning Engineer: Specializes in taking the models created by data scientists and deploying them into production, ensuring they run reliably and at scale

Role	Focus	Key Activities
Data Analyst	Describing the past and present	Creates reports and dashboards, uses descriptive statistics, and answers business questions with visualizations.
Data Scientist	Prediction and inference	Builds machine learning models, experiments with data, forecasts future outcomes, and uncovers hidden insights.
Machine Learning Engineer	Deploying and scaling models	Turns models into production-ready systems, ensures scalability and reliability, and monitors model performance over time.

Understanding these distinctions helps cut through the buzzwords and makes it easier to see how the pieces fit together.

# Tools of the Trade

So, how do data scientists actually do their work? A key part of the story is the toolkit they rely on to accomplish their tasks.

Data scientists commonly use programming languages like Python and R. Popular libraries for Python (for example) include:

Pandas for data manipulation
Matplotlib and Seaborn for visualization
Scikit-learn or PyTorch for building machine learning models

These tools lower the barrier to entry and make it possible to quickly move from raw data to actionable insights, without having to focus on building your own tools from scratch.

# Conclusion

Data science is a blended, multidisciplinary field that combines math, computer science, and domain expertise to reveal patterns, answer questions, and guide decisions. It isn’t the same as AI or machine learning, though those often play a part. Instead, it’s the structured, practical application of data to solve real-world problems and drive impact.

From retail to healthcare to finance, its applications are everywhere. Whether you’re just getting started or clarifying the buzzwords, understanding the scope, processes, and roles in data science provides a clear first step into this exciting field.

I hope you’ve enjoyed this concise, gentle introduction!

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

Source link

Jobs & Careers

5 Reasons Why Vibe Coding Threatens Secure Data App Development

Published

4 hours ago

September 2, 2025

Vinod Chugani

5 Reasons Vibe Coding Threatens Secure Data App Development

Image by Author | ChatGPT

# Introduction

AI-generated code is everywhere. Since early 2025, “vibe coding” (letting AI write code from simple prompts) has exploded across data science teams. It’s fast, it’s accessible, and it’s creating a security disaster. Recent research from Veracode shows AI models pick insecure code patterns 45% of the time. For Java applications? That jumps to 72%. If you’re building data apps that handle sensitive information, these numbers should worry you.

AI coding promises speed and accessibility. But let’s be honest about what you’re trading for that convenience. Here are five reasons why vibe coding poses threats to secure data application development.

# 1. Your Code Learns From Broken Examples

The problem is, a majority of analyzed codebases contain at least one vulnerability, with many of them harboring high-risk flaws. When you use AI coding tools, you’re rolling the dice with patterns learned from this vulnerable code.

AI assistants can’t tell secure patterns from insecure ones. This leads to SQL injections, weak authentication, and exposed sensitive data. For data applications, this creates immediate risks where AI-generated database queries enable attacks against your most critical information.

# 2. Hardcoded Credentials and Secrets in Data Connections

AI code generators have a dangerous habit of hardcoding credentials directly in source code, creating a security nightmare for data applications that connect to databases, cloud services, and APIs containing sensitive information. This practice becomes catastrophic when these hardcoded secrets persist in version control history and can be discovered by attackers years later.

AI models often generate database connections with passwords, API keys, and connection strings embedded directly in application code rather than using secure configuration management. The convenience of having everything just work in AI-generated examples creates a false sense of security while leaving your most sensitive access credentials exposed to anyone with code repository access.

# 3. Missing Input Validation in Data Processing Pipelines

Data science applications frequently handle user inputs, file uploads, and API requests, yet AI-generated code consistently fails to implement proper input validation. This creates entry points for malicious data injection that can corrupt entire datasets or enable code execution attacks.

AI models may lack information about an application’s security requirements. They will produce code that accepts any filename without validation and enables path traversal attacks. This becomes dangerous in data pipelines where unvalidated inputs can corrupt entire datasets, bypass security controls, or allow attackers to access files outside the intended directory structure.

# 4. Inadequate Authentication and Authorization

AI-generated authentication systems often implement basic functionality without considering the security implications for data access control, creating weak points in your application’s security perimeter. Real cases have shown AI-generated code storing passwords using deprecated algorithms like MD5, implementing authentication without multi-factor authentication, and creating insufficient session management systems.

Data applications require solid access controls to protect sensitive datasets, but vibe coding frequently produces authentication systems that lack role-based access controls for data permissions. The AI’s training on older, simpler examples means it often suggests authentication patterns that were acceptable years ago but are now considered security anti-patterns.

# 5. False Security From Inadequate Testing

Perhaps the most dangerous aspect of vibe coding is the false sense of security it creates when applications appear to function correctly while harboring serious security flaws. AI-generated code often passes basic functionality tests while concealing vulnerabilities like logic flaws that affect business processes, race conditions in concurrent data processing, and subtle bugs that only appear under specific conditions.

The problem is exacerbated because teams using vibe coding may lack the technical expertise to identify these security issues, creating a dangerous gap between perceived security and actual security. Organizations become overconfident in their applications’ security posture based on successful functional testing, not realizing that security testing requires entirely different methodologies and expertise.

# Building Secure Data Applications in the Age of Vibe Coding

The rise of vibe coding doesn’t mean data science teams should abandon AI-assisted development entirely. GitHub Copilot increased task completion speed for both junior and senior developers, demonstrating clear productivity benefits when used responsibly.

But here’s what actually works: successful teams using AI coding tools implement multiple safeguards rather than hoping for the best. The key is to never deploy AI-generated code without a security review; use automated scanning tools to catch common vulnerabilities; implement proper secret management systems; establish strict input validation patterns; and never rely solely on functional testing for security validation.

Successful teams implement a multi-layered approach:

Security-aware prompting that includes explicit security requirements in every AI interaction
Automated security scanning with tools like OWASP ZAP and SonarQube integrated into CI/CD pipelines
Human security review by security-trained developers for all AI-generated code
Continuous monitoring with real-time threat detection
Regular security training to keep teams current on AI coding risks

# Conclusion

Vibe coding represents a major shift in software development, but it comes with serious security risks for data applications. The convenience of natural language programming can’t override the need for security-by-design principles when handling sensitive data.

There has to be a human in the loop. If an application is fully vibe-coded by someone who cannot even review the code, they cannot determine whether it is secure. Data science teams must approach AI-assisted development with both enthusiasm and caution, embracing the productivity gains while never sacrificing security for speed.

The companies that figure out secure vibe coding practices today will be the ones that thrive tomorrow. Those that don’t may find themselves explaining security breaches instead of celebrating innovation.

Vinod Chugani was born in India and raised in Japan, and brings a global perspective to data science and machine learning education. He bridges the gap between emerging AI technologies and practical implementation for working professionals. Vinod focuses on creating accessible learning pathways for complex topics like agentic AI, performance optimization, and AI engineering. He focuses on practical machine learning implementations and mentoring the next generation of data professionals through live sessions and personalized guidance.

Source link

Jobs & Careers

Without the Hype, Apple Rolls Out FastVLM and MobileCLIP on Hugging Face

Published

5 hours ago

September 2, 2025

Ankush Das

Apple has made two of its latest vision-language and image-text models, FastVLM and MobileCLIP, publicly available on Hugging Face, highlighting its quiet but steady progress in AI research.

The release caught wider attention after Clem Delangue, CEO and co-founder of Hugging Face, posted on X, noting that Apple’s models are “up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications” and can even perform live video captioning locally in a browser.

He also mentioned, “If you think Apple is not doing much in AI, you’re getting blindsided by the chatbot hype and not paying enough attention!”

His remark was a reminder that while Apple avoids chatbot hype, its AI work is aimed at efficiency and on-device usability.

FastVLM, as per the research paper, tackles one of the long-standing challenges in vision-language models, which is balancing accuracy with latency.

Higher-resolution inputs typically improve accuracy but slow down processing. Apple researchers addressed this with FastViT-HD, a new hybrid vision encoder designed to produce fewer but higher-quality tokens. The result is a VLM that not only outperforms previous architectures in speed but also maintains strong accuracy, making it practical for tasks such as accessibility, robotics, and UI navigation.

The companion model, MobileCLIP, extends Apple’s push for efficient multimodal learning. Built through a novel multi-modal reinforced training approach, MobileCLIP delivers faster runtime and improved accuracy compared to prior CLIP-based models. According to Apple researchers, the MobileCLIP-S2 variant runs 2.3 times faster while being more accurate than earlier ViT-B/16 baselines, setting new benchmarks for mobile deployment.

Hugging Face page explains that the model has been exported to run with MLX, a framework for machine learning on Apple Silicon. One needs to follow the instructions in the official repository to use it in an iOS or macOS app.

With these releases, Apple signals that its AI ambitions lie not in competing directly with chatbot platforms, but in advancing efficient, privacy-preserving models optimised for real-world, on-device use.

The post Without the Hype, Apple Rolls Out FastVLM and MobileCLIP on Hugging Face appeared first on Analytics India Magazine.

Source link