Jobs & Careers
Ray or Dask? A Practical Guide for Data Scientists


Image by Author | Ideogram
As data scientists, we handle large datasets or complex models that require a significant amount of time to run. To save time and achieve results faster, we utilize tools that execute tasks simultaneously or across multiple machines. Two popular Python libraries for this are Ray and Dask. Both help speed up data processing and model training, but they are used for different types of tasks.
In this article, we will explain what Ray and Dask are and when to choose each one.
# What Are Dask and Ray?
Dask is a library used for handling large amounts of data. It is designed to work in a way that feels familiar to users of pandas, NumPy, or scikit-learn. Dask breaks data and tasks into smaller parts and runs them in parallel. This makes it perfect for data scientists who want to scale up their data analysis without learning many new concepts.
Ray is a more general tool that helps you build and run distributed applications. It is particularly strong in machine learning and AI tasks.
Ray also has extra libraries built on top of it, like:
- Ray Tune for tuning hyperparameters in machine learning
- Ray Train for training models on multiple GPUs
- Ray Serve for deploying models as web services
Ray is great if you want to build scalable machine learning pipelines or deploy AI applications that need to run complex tasks in parallel.
# Feature Comparison
A structured comparison of Dask and Ray based on core attributes:
Feature | Dask | Ray |
---|---|---|
Primary Abstraction | DataFrames, Arrays, Delayed tasks | Remote functions, Actors |
Best For | Scalable data processing, machine learning pipelines | Distributed machine learning training, tuning, and serving |
Ease of Use | High for Pandas/NumPy users | Moderate, more boilerplate |
Ecosystem | Integrates with scikit-learn , XGBoost |
Built-in libraries: Tune, Serve, RLlib |
Scalability | Very good for batch processing | Excellent, more control and flexibility |
Scheduling | Work-stealing scheduler | Dynamic, actor-based scheduler |
Cluster Management | Native or via Kubernetes, YARN | Ray Dashboard, Kubernetes, AWS, GCP |
Community/Maturity | Older, mature, widely adopted | Growing fast, strong machine learning support |
# When to Use What?
Choose Dask if you:
- Use
Pandas
/NumPy
and want scalability - Process tabular or array-like data
- Perform batch ETL or feature engineering
- Need
dataframe
orarray
abstractions with lazy execution
Choose Ray if you:
- Need to run many independent Python functions in parallel
- Want to build machine learning pipelines, serve models, or manage long-running tasks
- Need microservice-like scaling with stateful tasks
# Ecosystem Tools
Both libraries offer or support a range of tools to cover the data science lifecycle, but with different emphasis:
Task | Dask | Ray |
---|---|---|
DataFrames | dask.dataframe |
Modin (built on Ray or Dask) |
Arrays | dask.array |
No native support, rely on NumPy |
Hyperparameter tuning | Manual or with Dask-ML | Ray Tune (advanced features) |
Machine learning pipelines | dask-ml , custom workflows |
Ray Train, Ray Tune, Ray AIR |
Model serving | Custom Flask/FastAPI setup | Ray Serve |
Reinforcement Learning | Not supported | RLlib |
Dashboard | Built-in, very detailed | Built-in, simplified |
# Real-World Scenarios
// Large-Scale Data Cleaning and Feature Engineering
Use Dask.
Why? Dask integrates smoothly with pandas
and NumPy
. Many data teams already use these tools. If your dataset is too large to fit in memory, Dask can split it into smaller parts and process these parts in parallel. This helps with tasks like cleaning data and creating new features.
Example:
import dask.dataframe as dd
import numpy as np
df = dd.read_csv('s3://data/large-dataset-*.csv')
df = df[df['amount'] > 100]
df['log_amount'] = df['amount'].map_partitions(np.log)
df.to_parquet('s3://processed/output/')
This code reads multiple large CSV files from an S3 bucket using Dask in parallel. It filters rows where the amount column is greater than 100, applies a log transformation, and saves the result as Parquet files.
// Parallel Hyperparameter Tuning for Machine Learning Models
Use Ray.
Why? Ray Tune is great for trying different settings when training machine learning models. It integrates with tools like PyTorch and XGBoost
, and it can stop bad runs early to save time.
Example:
from ray import tune
from ray.tune.schedulers import ASHAScheduler
def train_fn(config):
# Model training logic here
...
tune.run(
train_fn,
config={"lr": tune.grid_search([0.01, 0.001, 0.0001])},
scheduler=ASHAScheduler(metric="accuracy", mode="max")
)
This code defines a training function and uses Ray Tune to test different learning rates in parallel. It automatically schedules and evaluates the best configuration using the ASHA scheduler.
// Distributed Array Computations
Use Dask.
Why? Dask arrays are helpful when working with large sets of numbers. It splits the array into blocks and processes them in parallel.
Example:
import dask.array as da
x = da.random.random((10000, 10000), chunks=(1000, 1000))
y = x.mean(axis=0).compute()
This code creates a large random array divided into chunks that can be processed in parallel. It then calculates the mean of each column using Dask’s parallel computing power.
// Building an End-to-End Machine Learning Service
Use Ray.
Why? Ray is designed not just for model training but also for serving and lifecycle management. With Ray Serve, you can deploy models in production, run preprocessing logic in parallel, and even scale stateful actors.
Example:
from ray import serve
@serve.deployment
class ModelDeployment:
def __init__(self):
self.model = load_model()
def __call__(self, request_body):
data = request_body
return self.model.predict([data])[0]
serve.run(ModelDeployment.bind())
This code defines a class to load a machine learning model and serve it through an API using Ray Serve. The class receives a request, makes a prediction using the model, and returns the result.
# Final Recommendations
Use Case | Recommended Tool |
---|---|
Scalable data analysis (Pandas-style) | Dask |
Large-scale machine learning training | Ray |
Hyperparameter optimization | Ray |
Out-of-core DataFrame computation | Dask |
Real-time machine learning model serving | Ray |
Custom pipelines with high parallelism | Ray |
Integration with PyData Stack | Dask |
# Conclusion
Ray and Dask are both tools that help data scientists handle large amounts of data and run programs faster. Ray is good for tasks that need a lot of flexibility, like machine learning projects. Dask is useful if you want to work with big datasets using tools similar to Pandas
or NumPy
.
Which one you choose depends on what your project needs and the type of data you have. It’s a good idea to try both on small examples to see which one fits your work better.
Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.
Jobs & Careers
DeepMind’s Demis Hassabis says calling AI PhD Intelligences is ‘Nonsense’

Demis Hassabis, CEO of Google DeepMind, dismissed claims that today’s AI systems are PhD intelligences, calling the label nonsense and arguing that current models lack the consistency and reasoning needed for true general intelligence.
“They’re not PhD intelligences,” Hassabis said in a recent All-in-podcast interview. “They have some capabilities that are PhD level, but they’re not in general capable, and that’s exactly what general intelligence should be—performing across the board at the PhD level.”
Hassabis’ statements come after OpenAI dubbed its latest AI model, GPT-5, as PhD-level.
Hassabis explained that while advanced language models can demonstrate impressive skills, they can also fail at simple problems. “As we all know, interacting with today’s chatbots, if you pose the question in a certain way, they can make simple mistakes with even high school maths and simple counting. That shouldn’t be possible for a true AGI system,” he said.
The DeepMind chief said artificial general intelligence (AGI) is still five to ten years away, pointing to missing capabilities such as continual learning and intuitive reasoning. “We are lacking consistency,” he said. “One of the things that separates a great scientist from a good scientist is creativity—the ability to spot patterns across subject areas. One day AI may be able to do this, but it doesn’t yet have the reasoning capabilities needed for such breakthroughs.”
On industry benchmarks, Hassabis pushed back against the idea of performance stagnation. “We’re not seeing that internally. We’re still seeing a huge rate of progress,” he said, countering reports that suggested convergence or slowing improvement among large language models.
Hassabis said that while scaling may deliver advances, one or two breakthroughs will still be required in the coming years.
Jobs & Careers
Databricks Invests in Naveen Rao’s New AI Hardware Startup

Ali Ghodsi, CEO and Co-Founder of Databricks, announced in a LinkedIn post on September 13 that the company is investing in a new AI hardware startup launched by Naveen Rao, former vice president of AI at Databricks.
Details of the company’s name, funding size, and product roadmap have not been disclosed yet.
“Over six months ago, Naveen Rao and I started discussing the potential to have a massive impact on the world of AI,” Ghodsi wrote. “Today, I’m excited to share that Naveen Rao is starting a company that I think has the potential to revolutionise the AI hardware space in fundamental ways.”
Rao, who previously founded Nervana (acquired by Intel) and MosaicML (acquired by Databricks), said the new project will focus on energy-efficient computing for AI.
“The new project is about rethinking the foundations of compute with respect to AI to build a new machine that is vastly more power efficient. Brain Scale Efficiency!” he said.
Ghodsi highlighted Rao’s track record in entrepreneurship and his contributions at Databricks. “If anyone can pull this off, it’s Naveen,” he noted, adding that Rao will continue advising Databricks while leading the new venture.
Databricks has closed a $10 billion Series J funding round, raising its valuation to $62 billion. The company’s revenue is approaching a $3 billion annual run rate, with forecasts indicating it could turn free cash flow positive by late 2024.
Growth is being fueled by strong adoption of the Databricks Data Intelligence Platform, which integrates generative AI accelerators. The platform is seeing rapid uptake across enterprises, positioning Databricks as one of the leading players in the enterprise AI stack.
Rao described the move as an example of Databricks supporting innovation in the AI ecosystem. “I’m very proud of all the work we did at Mosaic and Databricks and love to see how Databricks will be driving the frontier of AI in the enterprise,” he said.
Jobs & Careers
OpenAI Announces Grove, a Cohort for ‘Pre-Idea Individuals’ to Build in AI

OpenAI announced a new program called Grove on September 12, which is aimed at assisting technical talent at the very start of their journey in building startups and companies.
The ChatGPT maker says that it isn’t a traditional startup accelerator program, and offers ‘pre-idea’ individuals access to a dense talent network, which includes OpenAI’s researchers, and other resources to build their ideas in the AI space.
The program will begin with five weeks of content hosted in OpenAI’s headquarters in San Francisco, United States. This includes in-person workshops, weekly office hours, and mentorship with OpenAI’s leaders. The first Grove cohort will consist of approximately 15 participants, and OpenAI is recommending individuals from all domains and disciplines across various experience levels.
“In addition to technical support and community, participants will also have the opportunity to get hands-on with new OpenAI tools and models before general availability,” said OpenAI in the blog post.
Once the program is completed, the company says that participants will be able to explore opportunities to explore capital or pursue other avenues, internal or external to OpenAI. Interested applicants can fill out the form on OpenAI’s website by September 24.
Grove is in addition to other programs such as ‘Pioneers’ and ‘OpenAI for Startups’, which were announced earlier this year.
The OpenAI Pioneers program is an initiative that deploys AI to real-world use cases by assisting companies that intend to do so. OpenAI’s research teams will collaborate with these companies to solve the problems and expand their capabilities.
On the other hand, OpenAI for startups is an initiative designed to provide founders with AI tools, resources, and community support to scale their AI products. For instance, the program includes ‘live build hours’ where engineers from OpenAI provide hands-on demos, webinars, access to code repositories, ask me anything (AMA) sessions, case studies, and more.
It also includes real-life meetups, events, and more to assist founders in their journey. If startups are backed by venture capital firms that are partners of OpenAI (Thrive Capital, Sequoia, a16z, Kleiner Perkins, and Conviction Partners), they are eligible for free API credits, rate limit upgrades, and interactions with the company’s team members, alongside invites to exclusive events.
-
Business2 weeks ago
The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial
-
Tools & Platforms1 month ago
Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks
-
Ethics & Policy2 months ago
SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية
-
Events & Conferences4 months ago
Journey to 1000 models: Scaling Instagram’s recommendation system
-
Jobs & Careers3 months ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Podcasts & Talks2 months ago
Happy 4th of July! 🎆 Made with Veo 3 in Gemini
-
Education3 months ago
VEX Robotics launches AI-powered classroom robotics system
-
Education2 months ago
Macron says UK and France have duty to tackle illegal migration ‘with humanity, solidarity and firmness’ – UK politics live | Politics
-
Podcasts & Talks2 months ago
OpenAI 🤝 @teamganassi
-
Funding & Business3 months ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries