Jobs & Careers
AI to Write Community Notes for Fact Checking on X
X, the social media platform run by Elon Musk, has launched a pilot for an “AI Notes Writer,” a new API allowing developers to build AI systems that propose fact-checking notes on posts, with final judgment still in human hands.
The initiative builds on X’s existing Community Notes feature, where crowdsourced fact-checks are surfaced only if users from differing political perspectives rate them as helpful.
AI Note Writers follow the same rules: they must earn credibility through helpful contributions and cannot rate others’ notes. Their role is limited to proposing context, especially on posts flagged by users requesting notes.
“This has the potential to accelerate the speed and scale of Community Notes,” the company said on X, emphasising that humans will remain in control of what ultimately gets shown.
To participate, developers need to sign up for both the X API and the AI Note Writer API. Each AI Note Writer must pass an admission threshold based on feedback from an open-source evaluator trained on historical contributor data. Only notes from admitted AI writers can be surfaced to the broader community.
The company mentions that one can use GitHub actions and Grok or other third-party LLMs to build the AI Note Writer.
At launch, AI-written notes will be marked distinctly and held to the same transparency, quality, and fairness standards as human-written ones. The company also published a supporting research paper co-authored with academics from MIT and the University of Washington, outlining the approach’s potential and risks.
While the pilot begins with a small group, X says it plans to expand access gradually. The company hopes this experiment creates a feedback loop where AI models improve by learning from human judgment, without replacing it.
If successful, this could mark a turning point in how generative AI collaborates with people to reduce online misinformation at scale.
Jobs & Careers
7 DuckDB SQL Queries That Save You Hours of Pandas Work
Image by Author | Canva
Pandas library has one of the fastest-growing communities. This popularity has opened the door for alternatives, like polars. In this article, we will explore one such alternative, DuckDB.
DuckDB is an SQL database that you can run right in your notebook. No setup is needed, and no servers are needed. It is easy to install and can work with Pandas in parallel.
Unlike other SQL databases, you don’t need to configure the server. It just works with your notebook after installation. That means no local setup headaches, you’re writing the code instantly. DuckDB handles filtering, joins, and aggregations with clean SQL syntax, compared to Pandas, and performs significantly better on large datasets.
So enough with the terms, let’s get started!
Data Project – Uber Business Modeling
We will use it with Jupyter Notebook, combining it with Python for data analysis. To make things more exciting, we will work on a real-life data project. Let’s get started!
Here is the link to the data project we’ll be using in this article. It’s a data project from Uber called Partner’s Business Modeling.
Uber used this data project in the recruitment process for the data science positions, and you will be asked to analyze the data for two different scenarios.
- Scenario 1: Compare the cost of two bonus programs designed to get more drivers online during a busy day.
- Scenario 2: Calculate and compare the annual net income of a traditional taxi driver vs one who partners with Uber and buys a car.
Loading Dataset
Let’s load the dataframe first. This step will be needed; hence, we will register this dataset with DuckDB in the following sections.
import pandas as pd
df = pd.read_csv("dataset_2.csv")
Exploring the Dataset
Here are the first few rows:
Let’s see all the columns.
Here is the output.
Connect DuckDB and Register the DataFrame
Good, it is a really straightforward dataset, but how can we connect DuckDB with this dataset?
First, if you have not installed it yet, install DuckDB.
Connecting with DuckDB is easy. Also, if you want to read the documentation, check it out here.
Now, here is the code to make a connection and register the dataframe.
import duckdb
con = duckdb.connect()
con.register("my_data", df)
Good, let’s start exploring seven queries that will save you hours of Pandas work!
1. Multi-Criteria Filtering for Complex Eligibility Rules
One of the most significant advantages of SQL is how it naturally handles filtering, especially multi-condition filtering, very easily.
Implementation of Multi-Criterial Filtering in DuckDB vs Pandas
DuckDB allows you to apply multiple filters using SQL’s Where Clauses and logic, which scales well as the number of filters grows.
SELECT
*
FROM data
WHERE condition_1
AND condition_2
AND condition_3
AND condition_4
Now let’s see how we’d write the same logic in Pandas. In Pandas, the small logic is expressed using chained boolean masks with brackets, which can get verbose under many conditions.
filtered_df = df[
(df["condition_1"]) &
(df["condition_2"]) &
(df["condition_3"]) &
(df["condition_4"])
]
Both methods are equally readable and applicable to basic use. DuckDB feels more natural and cleaner as the logic gets more complex.
Multi-Criteria Filtering for the Uber Data Project
In this case, we want to find drivers who qualify for a specific Uber bonus program.
According to the rules, the drivers must:
- Be online for at least 8 hours
- Complete at least 10 trips
- Accept at least 90% of ride requests
- Having a rating of 4.7 or above
Now all we have to do is write a query that does all these filterings. Here is the code.
SELECT
COUN(*) AS qualified_drivers,
COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
But to execute this code with Python, we need to add con.execute(“”” “””) and fetchdf() methods as shown below:
con.execute("""
SELECT
COUNT(*) AS qualified_drivers,
COUNT(*) * 50 AS total_payout
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
""").fetchdf()
We will do this throughout the article. Now that you know how to run it in a Jupyter notebook, we’ll show only the SQL code from now on, and you’ll know how to convert it to the Pythonic version.
Good. Now, remember that the data project wants us to calculate the total payout for Option 1.
We’ve calculated the sum of the driver, but we should multiply this by $50, because the payout will be $50 for each driver, so we will do it with COUNT(*) * 50.
Here is the output.
2. Fast Aggregation to Estimate Business Incentives
SQL is great for quickly aggregating, especially when you need to summarize data across rows.
Implementation of Aggregation in DuckDB vs Pandas
DuckDB lets you aggregate values across rows using SQL functions like SUM and COUNT in one compact block.
SELECT
COUNT(*) AS num_rows,
SUM(column_name) AS total_value
FROM data
WHERE some_condition
In pandas, you first need to filter the dataframe, then separately count and sum using chaining methods.
filtered = df[df["some_condition"]]
num_rows = filtered.shape[0]
total_value = filtered["column_name"].sum()
DuckDB is more concise and easier to read, and does not require managing intermediate variables.
Aggregation in Uber Data Project
Good, let’s move on to the second bonus scheme, Option 2. According to the project description, drivers will receive $4 per trip if:
- They complete at least 12 trips.
- Have a rating of 4.7 or better.
This time, instead of just counting the drivers, we need to add the number of trips they completed since the bonus is paid per trip, not per person.
SELECT
COUNT(*) AS qualified_drivers,
SUM("Trips Completed") * 4 AS total_payout
FROM data
WHERE "Trips Completed" >= 12
AND Rating >= 4.7
The count here tells us how many drivers qualify. However, to calculate the total payout, we will calculate their trips and multiply by $4, as required by Option 2.
Here is the output.
With DuckDB, we don’t need to loop through the rows or build custom aggregations. The Sum function takes care of everything we need.
3. Detect Overlaps and Differences Using Boolean Logic
In SQL, you can easily combine the conditions by using Boolean Logic, such as AND, OR, and NOT.
Implementation of Boolean Logic in DuckDB vs Pandas
DuckDB supports boolean logic natively in the WHERE clause using AND, OR, and NOT.
SELECT *
FROM data
WHERE condition_a
AND condition_b
AND NOT (condition_c)
Pandas requires a combination of logical operators with masks and parentheses, including the use of “~” for negation.
filtered = df[
(df["condition_a"]) &
(df["condition_b"]) &
~(df["condition_c"])
]
While both are functional, DuckDB is easier to reason about when the logic involves exclusions or nested conditions.
Boolean Logic for Uber Data Project
Now we have calculated Option 1 and Option 2, what comes next? Now it is time to do the comparison. Remember our next question.
This is where we can use Boolean Logic. We’ll use a combination of AND and NOT.
SELECT COUNT(*) AS only_option1
FROM data
WHERE "Supply Hours" >= 8
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) >= 90
AND "Trips Completed" >= 10
AND Rating >= 4.7
AND NOT ("Trips Completed" >= 12 AND Rating >= 4.7)
Here is the output.
Let’s break it down:
- The first four conditions are here for Option 1.
- The NOT(..) part is used to exclude drivers who also qualify for Option 2.
It is pretty straightforward, right?
4. Quick Cohort Sizing with Conditional Filters
Sometimes, you want to understand how big a specific group or cohort is within your data.
Implementation of Conditional Filters in DuckDB vs Pandas?
DuckDB handles cohort filtering and percentage calculation with one SQL query, even including subqueries.
SELECT
ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE condition_1
AND condition_2
AND condition_3
Pandas requires filtering, counting, and manual division to calculate percentages.
filtered = df[
(df["condition_1"]) &
(df["condition_2"]) &
(df["condition_3"])
]
percentage = round(100.0 * len(filtered) / len(df), 2)
DuckDB here is cleaner and faster. It minimizes the number of steps and avoids repeated code.
Cohort Sizing For Uber Data Project
Now we are at the last question of Scenario 1. In this question, Uber wants us to find out the drivers that could not achieve some tasks, like trips and acceptance rate, yet had higher ratings, specifically the drivers.
- Completed less than 10 trips
- Had an acceptance rate lower than 90
- Had a rating higher than 4.7
Now, these are three separate filters, and we want to calculate the percentage of drivers satisfying each of them. Let’s see the query.
SELECT
ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM data), 2) AS percentage
FROM data
WHERE "Trips Completed" < 10
AND CAST(REPLACE("Accept Rate", '%', '') AS DOUBLE) = 4.7
Here is the output.
Here, we filtered the rows where all three conditions were satisfied, counted them, and divided them by the total number of drivers to get a percentage.
5. Basic Arithmetic Queries for Revenue Modeling
Now, let’s say you want to do some basic math. You can write expressions directly into your SELECT statement.
Implementation of Arithmetic in DuckDB vs Pandas
DuckDB allows arithmetic to be written directly in the SELECT clause like a calculator.
SELECT
daily_income * work_days * weeks_per_year AS annual_revenue,
weekly_cost * weeks_per_year AS total_cost,
(daily_income * work_days * weeks_per_year) - (weekly_cost * weeks_per_year) AS net_income
FROM data
Pandas requires multiple intermediate calculations in separate variables for the same result.
daily_income = 200
weeks_per_year = 49
work_days = 6
weekly_cost = 500
annual_revenue = daily_income * work_days * weeks_per_year
total_cost = weekly_cost * weeks_per_year
net_income = annual_revenue - total_cost
DuckDB simplifies the math logic into a readable SQL block, whereas Pandas gets a bit cluttered with variable assignments.
Basic Arithmetic in Uber Data Project
In Scenario 2, Uber asked us to calculate how much money (after expenses) the driver makes per year without partnering with Uber. Here are some expenses like gas, rent, and insurance.
Now let’s calculate the annual revenue and subtract the expenses from it.
SELECT
200 * 6 * (52 - 3) AS annual_revenue,
200 * (52 - 3) AS gas_expense,
500 * (52 - 3) AS rent_expense,
400 * 12 AS insurance_expense,
(200 * 6 * (52 - 3))
- (200 * (52 - 3) + 500 * (52 - 3) + 400 * 12) AS net_income
Here is the output.
With DuckDB, you can write this like a SQL matrix block. You don’t need Pandas Dataframes or manual looping!
6. Conditional Calculations for Dynamic Expense Planning
What if your cost structure changes based on certain conditions?
Implementation of Conditional Calculations in DuckDB vs Pandas
DuckDB lets you apply conditional logic using arithmetic adjustments inside your query.
SELECT
original_cost * 1.05 AS increased_cost,
original_cost * 0.8 AS discounted_cost,
0 AS removed_cost,
(original_cost * 1.05 + original_cost * 0.8) AS total_new_cost
Pandas uses the same logic with multiple math lines and manual updates to variables.
weeks_worked = 49
gas = 200
insurance = 400
gas_expense = gas * 1.05 * weeks_worked
insurance_expense = insurance * 0.8 * 12
rent_expense = 0
total = gas_expense + insurance_expense
DuckDB turns what would be a multi-step logic in pandas into a single SQL expression.
Conditional Calculations in Uber Data Project
In this scenario, we now model what happens if the driver partners with Uber and buys a car. The expenses change like
- Gas cost increases by 5%
- Insurance decreases by 20%
- No more rent expense
con.execute("""
SELECT
200 * 1.05 * 49 AS gas_expense,
400 * 0.8 * 12 AS insurance_expense,
0 AS rent_expense,
(200 * 1.05 * 49) + (400 * 0.8 * 12) AS total_expense
""").fetchdf()
Here is the output.
7. Goal-Driven Math for Revenue Targeting
Sometimes, your analysis can be driven by a business goal like hitting a revenue target or covering a one time cost.
Implementation of Goal-Driven Math in DuckDB vs Pandas
DuckDB handles multi-step logic using CTEs. It makes the query modular and easy to read.
WITH vars AS (
SELECT base_income, cost_1, cost_2, target_item
),
calc AS (
SELECT
base_income - (cost_1 + cost_2) AS current_profit,
cost_1 * 1.1 + cost_2 * 0.8 + target_item AS new_total_expense
FROM vars
),
final AS (
SELECT
current_profit + new_total_expense AS required_revenue,
required_revenue / 49 AS required_weekly_income
FROM calc
)
SELECT required_weekly_income FROM final
Pandas requires nesting of calculations and reuse of earlier variables to avoid duplication.
weeks = 49
original_income = 200 * 6 * weeks
original_cost = (200 + 500) * weeks + 400 * 12
net_income = original_income - original_cost
# new expenses + car cost
new_gas = 200 * 1.05 * weeks
new_insurance = 400 * 0.8 * 12
car_cost = 40000
required_revenue = net_income + new_gas + new_insurance + car_cost
required_weekly_income = required_revenue / weeks
DuckDB allows you to build a logic pipeline step by step, without cluttering your notebook with scattered code.
Goal-Driven Math in Uber Data Project
Now that we have modeled the new costs, let’s answer the final business question:
How much more does the driver need to earn per week to do both?
- Pay off a $40.000 car within a year
- Maintain the same yearly net income
Now let’s write the code representing this logic.
WITH vars AS (
SELECT
52 AS total_weeks_per_year,
3 AS weeks_off,
6 AS days_per_week,
200 AS fare_per_day,
400 AS monthly_insurance,
200 AS gas_per_week,
500 AS vehicle_rent,
40000 AS car_cost
),
base AS (
SELECT
total_weeks_per_year,
weeks_off,
days_per_week,
fare_per_day,
monthly_insurance,
gas_per_week,
vehicle_rent,
car_cost,
total_weeks_per_year - weeks_off AS weeks_worked,
(fare_per_day * days_per_week * (total_weeks_per_year - weeks_off)) AS original_annual_revenue,
(gas_per_week * (total_weeks_per_year - weeks_off)) AS original_gas,
(vehicle_rent * (total_weeks_per_year - weeks_off)) AS original_rent,
(monthly_insurance * 12) AS original_insurance
FROM vars
),
compare AS (
SELECT *,
(original_gas + original_rent + original_insurance) AS original_total_expense,
(original_annual_revenue - (original_gas + original_rent + original_insurance)) AS original_net_income
FROM base
),
new_costs AS (
SELECT *,
gas_per_week * 1.05 * weeks_worked AS new_gas,
monthly_insurance * 0.8 * 12 AS new_insurance
FROM compare
),
final AS (
SELECT *,
new_gas + new_insurance + car_cost AS new_total_expense,
original_net_income + new_gas + new_insurance + car_cost AS required_revenue,
required_revenue / weeks_worked AS required_weekly_revenue,
original_annual_revenue / weeks_worked AS original_weekly_revenue
FROM new_costs
)
SELECT
ROUND(required_weekly_revenue, 2) AS required_weekly_revenue,
ROUND(required_weekly_revenue - original_weekly_revenue, 2) AS weekly_uplift
FROM final
Here is the output.
Final Thoughts
In this article, we explored how to connect with DuckDB and analyze data. Instead of using long Pandas functions, we used SQL queries. We also did this using a real-life data project that Uber requested in the data scientist recruitment process.
For data scientists working on analysis-heavy tasks, it’s a lightweight but powerful alternative to Pandas. Try using it on your next project, especially when SQL logic fits the problem better.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.
Jobs & Careers
Fi.Money Launches Protocol to Connect Personal Finance Data with AI Assistants
Fi.Money, a money management platform based in India, has launched what it says is the first consumer-facing implementation of a model context protocol (MCP) for personal finance.
Fi MCP is designed to bring together users’ complete financial lives, including bank accounts, mutual funds, loans, insurance, EPF, real estate, gold, and more seamlessly into AI assistants of their choice, the company said in a statement.
Users can choose to share this consolidated data with any AI tool, enabling private, intelligent conversations about their money, fully on their terms, it added.
Until now, users have had to stitch together insights from various finance apps, statements, and spreadsheets. When turning to AI tools like ChatGPT or Gemini for advice, they’ve relied on manual inputs, guesswork, or generic prompts.
There was no structured, secure, consent-driven way to help AI understand your actual financial data without sharing screenshots or uploading statements and reports.
The company said that with Fi’s new MCP feature, users can see their entire financial life in a single, unified view.
This data can be privately exported in an AI-readable format or configured for near-real-time syncing with AI assistants.
Once connected, users can ask personal, data-specific questions such as, “Can I afford a six-month career break?” or “What are the mistakes in my portfolio?” and receive context-aware responses based on their actual financial information.
As per the statement, the launch comes at a time when Indian consumers are increasingly seeking digital-first, integrated financial tools. Building on India’s pioneering digital infrastructure, Fi’s MCP represents the next layer of consumer-facing innovation, one that empowers consumers to activate their own data.
Fi Money is the first in the world to let individuals use AI meaningfully with their own money, the company claimed. While most AIs lack context about one’s finances, Fi’s MCP changes that by giving users an AI that actually understands their money.
The Fi MCP is available to all Fi Money users. Any user can download the Fi Money app, consolidate their finances in a few minutes, and start using their data with their preferred AI assistant.
“This is the first time any personal finance app globally has enabled users to securely connect their actual financial data with tools like ChatGPT, Gemini, or Claude,” Sujith Narayanan, co-founder of Fi.Money, said in the statement.
“With MCP, we’re giving users not just a dashboard, but a secure bridge between their financial data and the AI tools they trust. It’s about helping people ask better questions and get smarter answers about their money,” he added.
Jobs & Careers
BRICS Leaders Call For Data Protection Against Unauthorised AI Use
Leaders from the BRICS coalition of developing countries are set to advocate for safeguards against unauthorised AI usage to prevent excessive data gathering and to establish systems for fair compensation, as outlined in a draft statement seen by Reuters.
Leading tech companies, predominantly located in wealthier nations, have pushed back against demands to pay copyright fees for content used in training AI systems.
On July 6, the heads of the 11 largest emerging economies ratified the Joint Declaration of the 17th BRICS Summit in Rio de Janeiro.
Prime Minister Narendra Modi stated that India views AI as a tool to augment human values and abilities, emphasising that both concerns and the promotion of innovation in AI governance should be prioritised equally. He stressed the importance of collective efforts in developing Responsible AI.
He argued that in the 21st century, humanity’s prosperity and progress are increasingly reliant on technology, particularly artificial intelligence. While AI offers significant potential to transform daily life, it also raises important concerns related to risks, ethics, and bias. “We see AI as a medium to enhance human values and capabilities,” the Prime Minister said.
Modi also invited the BRICS partners to the “AI Impact Summit” that India will host next year.
For the first time, AI governance is a key focus in the BRICS agenda, highlighting a Global South perspective on this technology.
In their joint declaration, the countries recognise that AI offers a unique opportunity for progress. Still, effective global governance is crucial for addressing risks and meeting the needs of all countries, particularly in the Global South.
“A collective global effort is needed to establish AI governance that upholds our shared values, addresses risks, builds trust, and ensures broad and inclusive international collaboration and access,” the countries said in a joint statement.
-
Funding & Business7 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers6 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions6 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business4 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle
-
Funding & Business7 days ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Tools & Platforms6 days ago
Winning with AI – A Playbook for Pest Control Business Leaders to Drive Growth
-
Jobs & Careers4 days ago
Ilya Sutskever Takes Over as CEO of Safe Superintelligence After Daniel Gross’s Exit