Connect with us

AI Research

RoboCat: A self-improving robotic agent

Published

on


Research

Published
Authors

The RoboCat team

New foundation agent learns to operate different robotic arms, solves tasks from as few as 100 demonstrations, and improves from self-generated data.

Robots are quickly becoming part of our everyday lives, but they’re often only programmed to perform specific tasks well. While harnessing recent advances in AI could lead to robots that could help in many more ways, progress in building general-purpose robots is slower in part because of the time needed to collect real-world training data.

Our latest paper introduces a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.

Previous research has explored how to develop robots that can learn to multi-task at scale and combine the understanding of language models with the real-world capabilities of a helper robot. RoboCat is the first agent to solve and adapt to multiple tasks and do so across different, real robots.

RoboCat learns much faster than other state-of-the-art models. It can pick up a new task with as few as 100 demonstrations because it draws from a large and diverse dataset. This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an important step towards creating a general-purpose robot.

How RoboCat improves itself

RoboCat is based on our multimodal model Gato (Spanish for “cat”), which can process language, images, and actions in both simulated and physical environments. We combined Gato’s architecture with a large training dataset of sequences of images and actions of various robot arms solving hundreds of different tasks.

After this first round of training, we launched RoboCat into a “self-improvement” training cycle with a set of previously unseen tasks. The learning of each new task followed five steps:

  1. Collect 100-1000 demonstrations of a new task or robot, using a robotic arm controlled by a human.
  2. Fine-tune RoboCat on this new task/arm, creating a specialised spin-off agent.
  3. The spin-off agent practises on this new task/arm an average of 10,000 times, generating more training data.
  4. Incorporate the demonstration data and self-generated data into RoboCat’s existing training dataset.
  5. Train a new version of RoboCat on the new training dataset.

RoboCat’s training cycle, boosted by its ability to autonomously generate additional training data.

The combination of all this training means the latest RoboCat is based on a dataset of millions of trajectories, from both real and simulated robotic arms, including self-generated data. We used four different types of robots and many robotic arms to collect vision-based data representing the tasks RoboCat would be trained to perform.

RoboCat learns from a diverse range of training data types and tasks: Videos of a real robotic arm picking up gears, a simulated arm stacking blocks and RoboCat using a robotic arm to pick up a cucumber.

Learning to operate new robotic arms and solve more complex tasks

With RoboCat’s diverse training, it learned to operate different robotic arms within a few hours. While it had been trained on arms with two-pronged grippers, it was able to adapt to a more complex arm with a three-fingered gripper and twice as many controllable inputs.

Left: A new robotic arm RoboCat learned to control
Right: Video of RoboCat using the arm to pick up gears

After observing 1000 human-controlled demonstrations, collected in just hours, RoboCat could direct this new arm dexterously enough to pick up gears successfully 86% of the time. With the same level of demonstrations, it could adapt to solve tasks that combined precision and understanding, such as removing the correct fruit from a bowl and solving a shape-matching puzzle, which are necessary for more complex control.

Examples of tasks RoboCat can adapt to solving after 500-1000 demonstrations.

The self-improving generalist

RoboCat has a virtuous cycle of training: the more new tasks it learns, the better it gets at learning additional new tasks. The initial version of RoboCat was successful just 36% of the time on previously unseen tasks, after learning from 500 demonstrations per task. But the latest RoboCat, which had trained on a greater diversity of tasks, more than doubled this success rate on the same tasks.

The big difference in performance between the initial RoboCat (one round of training) compared with the final version (extensive and diverse training, including self-improvement) after both versions were fine-tuned on 500 demonstrations of previously unseen tasks.

These improvements were due to RoboCat’s growing breadth of experience, similar to how people develop a more diverse range of skills as they deepen their learning in a given domain. RoboCat’s ability to independently learn skills and rapidly self-improve, especially when applied to different robotic devices, will help pave the way toward a new generation of more helpful, general-purpose robotic agents.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Billionaire Philipe Laffont Just Sold Coatue Management’s Stake in Super Micro Computer and Piled Into Another Artificial Intelligence (AI) Giant Up Over 336,000% Since Its IPO

Published

on


Philipe Laffont is part of an elite group of investors called the Tiger Cubs, who worked for Julian Robertson’s Tiger Management in the 1990s.

In the 1990s, an elite group of investors worked for a tech-focused hedge fund called Tiger Management, led by the legendary investor Julian Robertson. Not only did Robertson mentor this group of investors, but he would go on to seed many of their future hedge funds as the talented group, referred to as the Tiger Cubs, went on to become great investors in their own right.

Philippe Laffont, the founder of Coatue Management, is part of this group, and is now viewed as one of the great tech investors of the modern era. Coatue Management’s equity holdings were valued at roughly $35 billion at the end of the second quarter. That’s why investors are always paying attention to which stocks Coatue is buying and selling.

In the second quarter, the fund sold its stake in Super Micro Computer (SMCI -5.42%) and piled into another artificial intelligence (AI) giant that generated a total return over 336,000% since its initial public offering.

Image source: Getty Images.

Super Micro Computer: Beating the shorts so far

AI and tech infrastructure and server maker Super Micro Computer has been a controversial and volatile play for the past year. In August 2024, short-seller Hindenburg Research came out with a major short report alleging potential accounting fraud at the company. The report said that Supermicro rehired executives who had been a part of an accounting scandal at the company in 2018 that involved understating expenses and overstating revenue.

The stock got hit hard after Supermicro announced it would need to delay its annual 2024 filing to assess its internal controls. However, the company would eventually go on to file its 2024 10-K and did not need to restate any of its financial statements, a good sign for investors. Furthermore, management earlier this year also provided strong fiscal 2026 guidance of $40 billion in revenue, way ahead of consensus at the time. Supermicro’s fiscal year ends on June 30 of each year.

In August, shares struggled after the company reported lower-than-expected quarterly results and weaker-than-expected guidance, due to President Donald Trump’s tariffs, which resulted in less working capital in June and “specification changes from a major new customer.” Laffont and Coatue loaded up on the stock some time in the fourth quarter of 2024 and sold in the second quarter of this year, so the fund could have bought the dip after the short report and might have sold over concerns about tariffs, although that’s speculation. Supermicro’s stock is up about 46% this year, so Coatue seems to have timed its trade well.

Supermicro looks real cheap right now for a stock benefiting from the AI boom, trading around 16 times forward earnings. Tariffs are likely to be an ongoing issue but if AI demand remains strong, Supermicro, which supplies servers to the likes of Nvidia, should be a major beneficiary. The stock may remain volatile, but I think investors can take a position in the more speculative part of their portfolio.

Oracle: A longtime tech player benefiting from AI

With a market cap of nearly $664 billion, Oracle (ORCL -5.97%) isn’t part of the “Magnificent Seven,” but it’s another large tech company expected to benefit from the AI capital expenditure boom. Coatue purchased over 3.8 million shares in the second quarter, valued at over $843 million.

The cloud giant offers clients the ability to tap into a number of AI solutions including generative AI and machine learning capabilities that provide automation tools and AI application development, among other services. Similar to Microsoft and Amazon, although not as dominant, Oracle’s position as a cloud provider positions the company well to be a first point of contact for clients looking to add AI capabilities.

In the company’s most recent earnings report for its fourth quarter of fiscal 2025, which ended May 31, Oracle reported results ahead of Wall Street estimates and said that cloud infrastructure revenue sales should increase 70% in fiscal year 2026, after generating 52% growth in fiscal 2025.

Oracle CEO Larry Ellison said the company is particularly well positioned because it has a strong data advantage and has developed one of the most comprehensive databases in the world. “Our applications take all of your application data and make that data available to the most popular AI models,” he said on Oracle’s earnings call for the company’s fiscal fourth quarter of 2025.

If you like ChatGPT, you use ChatGPT. If you like Grok, you use Grok. You use that in the Oracle Cloud. We are the key enabler for enterprises to use their own data and models. No one else is doing that.

Having gone public in 1986, Oracle has been a major tech disruptor for decades. The stock is up over 336,000% since its initial public offering and also up over 41% this year. Trading at 34 times forward earnings, the stock is not necessarily cheap, but given its track record and strong expected growth in cloud infrastructure, Oracle can benefit from AI without being as much in the spotlight as some of the Magnificent Seven names.

Bram Berkowitz has no position in any of the stocks mentioned. The Motley Fool has positions in and recommends Amazon, Microsoft, Nvidia, and Oracle. The Motley Fool recommends the following options: long January 2026 $395 calls on Microsoft and short January 2026 $405 calls on Microsoft. The Motley Fool has a disclosure policy.



Source link

Continue Reading

AI Research

TikTok Salaries Revealed: How Much AI, E-Commerce Workers Make in 2025

Published

on


TikTok’s US plans are up in the air due to a divest-or-ban law that puts its future in jeopardy. But it’s still offering six-figure salaries to workers this year in key areas like e-commerce and artificial intelligence.

It’s sought to hire data scientists to sharpen its search algorithm, court workers to grow its e-commerce platform TikTok Shop, and bring in machine learning engineers to improve its content feed and recommendations.

The company’s jobs portal lists over 1,800 open roles in the US in cities like Austin, San Jose, Seattle, and New York.

Like other Big Tech firms, work expectations at TikTok and its owner, ByteDance, are demanding. The company runs performance reviews twice a year, and low scorers can be placed on performance-improvement plans or even shown the door. But the opportunity to work at one of the most influential tech companies in the world continues to draw in talent.

Outside e-commerce, TikTok is shaking up areas like music marketing and young people’s news habits. If it can navigate political tides in the US and China, where ByteDance was founded, it will stand alongside YouTube and a few other players in shaping the next phase of media.

“From a career growth standpoint, you have access to huge budgets and big names,” a former staffer said of working at TikTok. “Everyone in the industry wants to talk to you.”

While TikTok and ByteDance don’t disclose salary information publicly (unless required by state law), they do submit pay ranges in federal filings when they look to hire workers from outside the US.

To understand more about the company’s pay rates, Business Insider reviewed thousands of TikTok salary offers for foreign hires at the company, as well as its owner, ByteDance, for the first three quarters of this reporting year that ran through June 30. The results don’t include equity or other benefits that employees often receive in addition to base pay. But they paint a picture of the range of pay a worker might expect in roles like software engineering, data science, or product management.

The foreign-hire data shows a wide range of salaries at the companies. For example, a finance representative could earn $65,000 a year, and a global head of product and design position could fetch a $949,349 annual salary.

Backend software engineers at TikTok could earn between $144,000 and $301,158, based on the salary data, though rates increased beyond that for specialties like trust and safety. Data scientist positions at TikTok were generally offered between $85,821 and $283,629 — or more in specific areas like e-commerce. For TikTok machine learning scientists, the range was between $168,000 and $390,000, while general marketing managers were offered between $85,000 and $430,000.

These salary offers fall in line with pay rates in federal applications at other Big Tech firms. Meta’s first-quarter visa filings revealed it offered data scientists between $122,760 and $270,000, for example. Meanwhile, a staff software engineer at Google could receive between $220,000 and $323,000, according to the company’s first-quarter filings.

Here are the salary ranges TikTok and ByteDance offered for other roles in key business areas, based on recent applications. TikTok and ByteDance did not respond to requests for comment.

E-commerce and TikTok Shop roles

TikTok Shop – Celebrity Team Live Operation Manager: $94,000

TikTok Shop – US Data Analyst – Logistics: $128,000

TikTok Shop – Campaign Strategy Operations Manager: $132,000

TikTok Shop – Category Manager – Health: $135,000

TikTok Shop – Anti-Fraud Ops Program Mgr – Global Selling: $180,000

TikTok Shop – Data Scientist: $218,000 to $304,000

Product Manager, User Growth Customer Lifecycle-TikTok Shop: $220,000

Strategy Manager, E-Commerce: $228,000 to $230,000

Software Engineer – E-commerce Recommendation Infrastructure: $237,000 to $315,207

TikTok Shop – Inventory Placement Strategy Manager: $250,000

TikTok Shop- Compliance Operation: $257,600

Senior Machine Learning Engineer, E-commerce: $320,000

Tech Lead – E-commerce Recommendation Infrastructure: $320,113

Logistics Procurement Lead, TikTok US E-commerce: $350,000

Senior Data Scientist, Content E-commerce: $350,000

Tech Lead, Global E-commerce Governance Platform: $365,000

Global E-commerce Solutions Manager: $480,000

AI and machine learning roles

Software Engineer (AI Platform): $144,000

Research Scientist (TikTok AI Privacy): $188,000

Product Manager GenAI Safety, Trust & Safety: $218,400

Senior Product Designer, Creation (AI Projects): $221,368

Machine Learning Engineer – Computer Vision: $228,960

Software Engineer, Machine Learning Infrastructure: $270,000 to $320,783

Site Reliability Engineer, AI Applications: $276,000

AI Product Manager: $300,010

Product Manager Lead, Emerging Product & AI Safety: $336,000

AI Security Researcher – Security Flow: $340,000

Senior Machine Learning Engineer, TikTok Recommendation: $386,115

Search roles

Search Product Operations – Creator Search Optimization: $110,000

Software Engineer – TikTok Search Business Infrastructure: $154,880 to $214,720

Product Manager, Search Ads: $205,000

Machine Learning Engineer – Search Ads: $229,200 to $354,000

Machine Learning Engineer – TikTok Search: $241,200 to $300,000

Senior Machine Learning Engineer – TikTok Search Business: $268,920

Product Manager – TikTok Search: $287,500

Product Manager, Search Content Ecosystem: $400,000

Leader of Search and Recommendation Product (ByteDance): $540,552

Search Ads Closed-loop Product Manager: $564,000





Source link

Continue Reading

AI Research

How to Build a Conversational Research AI Agent with LangGraph: Step Replay and Time-Travel Checkpoints

Published

on


In this tutorial, we aim to understand how LangGraph enables us to manage conversation flows in a structured manner, while also providing the power to “time travel” through checkpoints. By building a chatbot that integrates a free Gemini model and a Wikipedia tool, we can add multiple steps to a dialogue, record each checkpoint, replay the full state history, and even resume from a past state. This hands-on approach enables us to see, in real-time, how LangGraph’s design facilitates the tracking and manipulation of conversation progression with clarity and control. Check out the FULL CODES here.

!pip -q install -U langgraph langchain langchain-google-genai google-generativeai typing_extensions
!pip -q install "requests==2.32.4"


import os
import json
import textwrap
import getpass
import time
from typing import Annotated, List, Dict, Any, Optional


from typing_extensions import TypedDict


from langchain.chat_models import init_chat_model
from langchain_core.messages import BaseMessage
from langchain_core.tools import tool


from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.prebuilt import ToolNode, tools_condition


import requests
from requests.adapters import HTTPAdapter, Retry


if not os.environ.get("GOOGLE_API_KEY"):
   os.environ["GOOGLE_API_KEY"] = getpass.getpass("🔑 Enter your Google API Key (Gemini): ")


llm = init_chat_model("google_genai:gemini-2.0-flash")

We start by installing the required libraries, setting up our Gemini API key, and importing all the necessary modules. We then initialize the Gemini model using LangChain so that we can use it as the core LLM in our LangGraph workflow. Check out the FULL CODES here.

WIKI_SEARCH_URL = "https://en.wikipedia.org/w/api.php"


_session = requests.Session()
_session.headers.update({
   "User-Agent": "LangGraph-Colab-Demo/1.0 (contact: [email protected])",
   "Accept": "application/json",
})
retry = Retry(
   total=5, connect=5, read=5, backoff_factor=0.5,
   status_forcelist=(429, 500, 502, 503, 504),
   allowed_methods=("GET", "POST")
)
_session.mount("https://", HTTPAdapter(max_retries=retry))
_session.mount("http://", HTTPAdapter(max_retries=retry))


def _wiki_search_raw(query: str, limit: int = 3) -> List[Dict[str, str]]:
   """
   Use MediaWiki search API with:
     - origin='*' (good practice for CORS)
     - Polite UA + retries
   Returns compact list of {title, snippet_html, url}.
   """
   params = {
       "action": "query",
       "list": "search",
       "format": "json",
       "srsearch": query,
       "srlimit": limit,
       "srprop": "snippet",
       "utf8": 1,
       "origin": "*",
   }
   r = _session.get(WIKI_SEARCH_URL, params=params, timeout=15)
   r.raise_for_status()
   data = r.json()
   out = []
   for item in data.get("query", {}).get("search", []):
       title = item.get("title", "")
       page_url = f"https://en.wikipedia.org/wiki/{title.replace(' ', '_')}"
       snippet = item.get("snippet", "")
       out.append({"title": title, "snippet_html": snippet, "url": page_url})
   return out


@tool
def wiki_search(query: str) -> List[Dict[str, str]]:
   """Search Wikipedia and return up to 3 results with title, snippet_html, and url."""
   try:
       results = _wiki_search_raw(query, limit=3)
       return results if results else [{"title": "No results", "snippet_html": "", "url": ""}]
   except Exception as e:
       return [{"title": "Error", "snippet_html": str(e), "url": ""}]


TOOLS = [wiki_search]

We set up a Wikipedia search tool with a custom session, retries, and a polite user-agent. We define _wiki_search_raw to query the MediaWiki API and then wrap it as a LangChain tool, allowing us to seamlessly call it within our LangGraph workflow. Check out the FULL CODES here.

class State(TypedDict):
   messages: Annotated[list, add_messages]


graph_builder = StateGraph(State)


llm_with_tools = llm.bind_tools(TOOLS)


SYSTEM_INSTRUCTIONS = textwrap.dedent("""
You are ResearchBuddy, a careful research assistant.
- If the user asks you to "research", "find info", "latest", "web", or references a library/framework/product,
 you SHOULD call the `wiki_search` tool at least once before finalizing your answer.
- When you call tools, be concise in the text you produce around the call.
- After receiving tool results, cite at least the page titles you used in your summary.
""").strip()


def chatbot(state: State) -> Dict[str, Any]:
   """Single step: call the LLM (with tools bound) on the current messages."""
   return {"messages": [llm_with_tools.invoke(state["msgs"])]}


graph_builder.add_node("chatbot", chatbot)


memory = InMemorySaver()
graph = graph_builder.compile(checkpointer=memory)

We define our graph state to store the running message thread and bind our Gemini model to the wiki_search tool, allowing it to call it when needed. We add a chatbot node and a tools node, wire them with conditional edges, and enable checkpointing with an in-memory saver. We now compile the graph so we can add steps, replay history, and resume from any checkpoint. Check out the FULL CODES here.

def print_last_message(event: Dict[str, Any]):
   """Pretty-print the last message in an event if available."""
   if "messages" in event and event["messages"]:
       msg = event["messages"][-1]
       try:
           if isinstance(msg, BaseMessage):
               msg.pretty_print()
           else:
               role = msg.get("role", "unknown")
               content = msg.get("content", "")
               print(f"\n[{role.upper()}]\n{content}\n")
       except Exception:
           print(str(msg))


def show_state_history(cfg: Dict[str, Any]) -> List[Any]:
   """Print a concise view of checkpoints; return the list as well."""
   history = list(graph.get_state_history(cfg))
   print("\n=== 📜 State history (most recent first) ===")
   for i, st in enumerate(history):
       n = st.next
       n_txt = f"{n}" if n else "()"
       print(f"{i:02d}) NumMessages={len(st.values.get('messages', []))}  Next={n_txt}")
   print("=== End history ===\n")
   return history


def pick_checkpoint_by_next(history: List[Any], node_name: str = "tools") -> Optional[Any]:
   """Pick the first checkpoint whose `next` includes a given node (e.g., 'tools')."""
   for st in history:
       nxt = tuple(st.next) if st.next else tuple()
       if node_name in nxt:
           return st
   return None

We add utility functions to make our LangGraph workflow easier to inspect and control. We use print_last_message to neatly display the most recent response, show_state_history to list all saved checkpoints, and pick_checkpoint_by_next to locate a checkpoint where the graph is about to run a specific node, such as the tools step. Check out the FULL CODES here.

config = {"configurable": {"thread_id": "demo-thread-1"}}


first_turn = {
   "messages": [
       {"role": "system", "content": SYSTEM_INSTRUCTIONS},
       {"role": "user", "content": "I'm learning LangGraph. Could you do some research on it for me?"},
   ]
}


print("\n==================== 🟢 STEP 1: First user turn ====================")
events = graph.stream(first_turn, config, stream_mode="values")
for ev in events:
   print_last_message(ev)


second_turn = {
   "messages": [
       {"role": "user", "content": "Ya. Maybe I'll build an agent with it!"}
   ]
}


print("\n==================== 🟢 STEP 2: Second user turn ====================")
events = graph.stream(second_turn, config, stream_mode="values")
for ev in events:
   print_last_message(ev)

We simulate two user interactions in the same thread by streaming events through the graph. We first provide system instructions and ask the assistant to research LangGraph, then follow up with a second user message about building an autonomous agent. Each step is checkpointed, allowing us to replay or resume from these states later. Check out the FULL CODES here.

print("\n==================== 🔁 REPLAY: Full state history ====================")
history = show_state_history(config)


to_replay = pick_checkpoint_by_next(history, node_name="tools")
if to_replay is None:
   to_replay = history[min(2, len(history) - 1)]


print("Chosen checkpoint to resume from:")
print("  Next:", to_replay.next)
print("  Config:", to_replay.config)


print("\n==================== ⏪ RESUME from chosen checkpoint ====================")
for ev in graph.stream(None, to_replay.config, stream_mode="vals"):
   print_last_message(ev)


MANUAL_INDEX = None 
if MANUAL_INDEX is not None and 0 <= MANUAL_INDEX < len(history):
   chosen = history[MANUAL_INDEX]
   print(f"\n==================== 🧭 MANUAL RESUME @ index {MANUAL_INDEX} ====================")
   print("Next:", chosen.next)
   print("Config:", chosen.config)
   for ev in graph.stream(None, chosen.config, stream_mode="values"):
       print_last_message(ev)


print("\n✅ Done. You added steps, replayed history, and resumed from a prior checkpoint.")

We replay the full checkpoint history to see how our conversation evolves across steps and identify a useful point to resume. We then “time travel” by restarting from a selected checkpoint, and optionally from any manual index, so we continue the dialogue exactly from that saved state.

In conclusion, we have gained a clearer picture of how LangGraph’s checkpointing and time-travel capabilities bring flexibility and transparency to conversation management. By stepping through multiple user turns, replaying state history, and resuming from earlier points, we can experience firsthand the power of this framework in building reliable research agents or autonomous assistants. We recognize that this workflow is not just a demo, but a foundation that we can extend into more complex applications, where reproducibility and traceability are as important as the answers themselves.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

Continue Reading

Trending