AI Research
Multi-turn conversations with Action-Based Contrastive Self-Training
Are action-based preferences necessary? One of the key factors of ACT is that the contrastive pairs highlight differences between conversational actions. In “ACT w/ Random Actions”, we additionally examine the importance of action selection by randomly sampling both the winning and losing action when constructing the preference pair, and observe this underperforms normal ACT.
Do we need on-policy sampling? In “ACT w/o on-policy sampling”, we examine the importance of on-policy sampling by evaluating normal off-policy DPO on the dataset as constructed in Phase 1. While we do observe some improvements over SFT (e.g., from 69.0 to 74.8 Macro F1), the overall improvements are much larger when using on-policy sampling as with full ACT. This may be due to the fact that the off-policy negative responses are not guaranteed to lie in the language manifold of the policy model, and distribution shift may be too difficult to overcome with off-policy learning.
Is trajectory simulation necessary? ACT is better-aligned with multi-turn conversations due to its trajectory simulation. Without multi-turn simulation, our approach can be viewed similarly to on-policy DPO variants like IRPO, but with a conversation-specific reward signal which accounts for conversation actions and task heuristics. In “ACT w/ sampling w/o simulation”, we find that this trajectory-level simulation is critical to improving multi-turn performance, especially the policy model’s ability to reason about its own clarification questions.
Is ACT model agnostic? The base model in our main experiments, Zephyr, is obtained by aligning Mistral. In “ACT with unaligned foundation models” we observe a performance gap of 6.5 Action F1 and 4.3 Trajectory F1 after ACT tuning for the two models. However, our results demonstrate ACT can improve performance regardless of pre-existing alignment with human feedback, although it can help as an improved model initialization. Overall, we find that improving base model performance with ACT is model agnostic.
AI Research
Microsoft announces public preview of deep research in Azure AI foundry
Synopsis
This tool aims to accelerate research and deployment of frontier AI technologies, further strengthening Microsoft’s position in enterprise-grade generative AI solutions.
Elevate your knowledge and leadership skills at a cost cheaper than your daily tea.
Subscribe Now
AI Research
Deep Research And More From New AI Models
Young vocalist with a microphone singing on stage
Did you hate these annoying songs? Check out what new deep research AI tools can do.
It’s all well and good to tell people to get on ChatGPT, or Claude, or Gemini, and get their hands dirty using an AI model – but some people want more detail. They want to know how to use the best new tools in a more granular way.
Deep research is one of the newest and most impressive functions of today’s large language models. But most people haven’t heard of it yet, let alone used it.
Ethan Mollick is someone I associate with the vanguard of AI research. With former MIT connections, Mollick represents a leading voice in this field. So I was reading his latest blog post about different models, and how to use them.
Mollick breaks down the most useful models for Claud, ChatGPT and Google in the following categories: “chat,” “work” and “hard problems.”
Claude, he notes, is generally better for privacy, where you need to select for this in ChatGPT, and may compromise some functionality with Google.
Then Mollick embarks on a description of deep research capabilities that are baked into some of the newest models, but not yet available for free. This is so new that it’s still paywalled and rationed for paying users.
The Promise of Deep Research Functionality
Deep research is available from OpenAI, and from Google, and from Anthropic, too.
When you look at the unveiling of the research function, you see a lot of reports around February.
So how do you access this kind of result?
I found a recent post about this from the “OpenAI cookbook” in June, where the poster talks about downloading a Python SDK, and API work.
It turns out, though, that you can select “deep research” in the prompt section of ChatGPT 4.5, if you are a paying Plus or Pro user, and get your deep research report that way.
Asking for Deep Research from ChatGPT 4.5
I wanted to try this for myself, so based on what I was experiencing in the moment, I centered on compiling a report of the most annoying songs from 1990 through 2025.
My query: “please create a report on the most annoying music from 1990 until 2025, segmented annually.”
Simple, right?
First, ChatGPT asked me to clarify a few basic questions:
“To create a detailed report on the ‘most annoying music’ from 1990 to 2025, could you please clarify a few things:
- What do you mean by ‘annoying’? Are you referring to songs widely criticized, overplayed, polarizing, or based on listener polls or memes?
- Should the list be global or focus on a specific region (e.g., U.S., UK, etc.)?
- Do you want one song per year, or multiple per year if applicable?
- Would you like the report to include explanations or cultural context for each selection?”
And I answered:
“1. all of the above 2. US-based 3. multiple songs per year 4. Yes.”
“Thanks!” GPT responded. “I’ll compile a U.S.-focused report covering the most widely criticized, overplayed, or culturally mocked songs from 1990 to 2025. This will include multiple songs per year, along with context explaining why they were considered annoying—drawing from music critics, listener polls, memes, radio overplay complaints, and public opinion.”
Then the report started. Indeed, it took more than 10 minutes to compile, with CoT displays like “searching ‘Rico Suave’: worst song’ and ‘looking for Macarena.’
Check Out GPT’s Chain of Thought
Actually, here are some of the interesting CoT notes the system provided while doing research on dozens of sites (it spent a lot of time reading the Guardian!)
I’m mapping out recent headlines and thought pieces, like those from Global News and the Guardian, all noting “Baby Shark” as the most annoying song. Hm, that’s interesting.
I’m preparing the list of most annoying songs from 1993-1995, considering “What’s Up,” “Cotton Eye Joe,” “Macarena,” and “Wonderwall.” However, I’m thinking through the inclusion of other notable tracks like “Rico Suave” for earlier years.
(There’s that ‘Rico Suave’ song again.)
I’m mapping out the hits and misses of 2001-2006, with 2005’s viral sensations like “Crazy Frog” and “Photograph” by Nickelback taking center stage. Hm, that’s interesting—these tracks seem to evoke strong reactions.
(Note how the model is actually ‘taking notes’ while researching, giving us little bits of personality along with the actual meat of the report.)
Here’s a funny comparison:
I’m comparing the impact of Nickelback’s “Rockstar” in 2006 to Soulja Boy’s “Crank That” in 2007. The former was widely criticized, while the latter was loved by many, raising questions about musical taste over time.
A research-based note:
I’m digging through songs by genre and year, considering widespread opinion and specific examples to select the most frequently labeled annoying tracks.
I’m evaluating hits like Meghan Trainor’s “All About That Bass” (2014) and Tyga’s “Stimulated” (2015) for their ubiquity and frequency of being labeled annoying. Progressing through last five years gradually.
I’m digging through songs by genre and year, considering widespread opinion and specific examples to select the most frequently labeled annoying tracks.
I’m evaluating hits like Meghan Trainor’s “All About That Bass” (2014) and Tyga’s “Stimulated” (2015) for their ubiquity and frequency of being labeled annoying. Progressing through last five years gradually.
And then there was this:
“Noticing the challenge of time limitations, I’m figuring out how to balance inclusion of pivotal examples against the need for conciseness.”
So the model is actually trying to triage its work. This is next-level stuff.
Finally, GPT appeared to wrap up, dotting its I’s and crossing its Ts:
“I’m checking to see if the ‘Achy Breaky’ song contains any references to anthropomorphic sharks.”
Eventually, I got the full report. It ranges dozens of pages, and provides great coverage of over a quarter century of horrible, terrible music. A few standouts, with years:
1990
- “Ice Ice Baby” – Vanilla Ice (1990): This hip-hop crossover hit became a pop culture punchline due to its cheesy lyrics and ubiquitous play. While it was initially a chart-topping success, critics later lambasted it; Spinner magazine ranked it the second-worst song in history, and Blender ranked it fifthen.wikipedia.org. A Houston Press critic even dubbed it the worst song ever by a Texas artist, claiming it “set back the cause of white people in hip-hop by a decade”en.wikipedia.org. Its overexposure turned the once-popular anthem into an annoyance for many listenerscraftyourhappyplace.com.
- “Rico Suave” – Gerardo (1990): A bilingual novelty rap track that peaked in 1991, “Rico Suave” was a one-hit wonder often cited on “worst song” lists. Its goofy, exaggerated Latin lover persona and repetitive chorus wore thin quickly. Blender magazine’s infamous list of the “50 Worst Songs Ever” placed “Rico Suave” among its ranks (in the high 30s), and retrospective pop culture forums frequently hold it up as an example of early ’90s pop cringeculturecrossfire.comforums.thefashionspot.com. The song’s catchphrase chorus (and its shirtless music video) made it memorable, but also widely mocked in the years since.
1994
- “Cotton Eye Joe” – Rednex (1994): This techno-hoedown remix of an old folk tune became a staple at sports games and parties – much to many people’s annoyance. With its frenetic fiddle loops and the grating “If it hadn’t been for Cotton Eye Joe” chorus, the track wears thin fast. CBS News put it at #1 on a “Worst Songs from Jock Jams” list, calling it an “unholy hybrid of country-fried fiddle kitsch and relentless hi-NRG drums”en.wikipedia.org. A Toronto Star critic dubbed it the worst song of the ’90s, describing it as “straight drivel”en.wikipedia.org. Even wrestler-turned-rocker Chris Jericho said it’s one of his least favorite songs ever – naming it alongside “Mambo No. 5” on his personal worst listen.wikipedia.org. Despite its party ubiquity, few ever actually wanted an “EDM-hillbilly” mashup like thisen.wikipedia.org.
1997
- “Barbie Girl” – Aqua (1997): This novelty dance-pop hit was simultaneously a global chart-topper and a critical punching bag. Its bouncy, plastic sound and exaggerated vocals (“Come on Barbie, let’s go party!”) made it a staple of late-’90s radio and kids’ parties, but also an irritant to many. Rolling Stone readers voted it the worst song of the ’90sen.wikipedia.org, and NME put it atop their worst-single list as wellen.wikipedia.org. While Aqua intended it as satire, the song was widely mocked; it even sparked a lawsuit from Mattel for tarnishing the Barbie brand. Decades later, it remains a guilty pleasure earworm that routinely tops “most annoying” song listscraftyourhappyplace.com.
Fast-forward a bit:
2013
- “The Fox (What Does the Fox Say?)” – Ylvis (2013): An absurdist novelty song by a Norwegian comedy duo, “The Fox” unexpectedly became a global viral hit in late 2013. Its EDM-lite beat and litany of animal sounds (ring-ding-ding, wa-pa-pa-pow!) made it the “Macarena” of the YouTube era – funny at first, then torturous on repeat. Kids loved the silliness, while parents and many others quickly tired of it. The song was essentially one long joke about animal noises, and once the joke wore off, what remained was an insanely repetitive chorus of nonsense. Rolling Stone readers later voted it one of the most annoying songs of the 2010s, and USA Today called it “the year’s most aggravating meme song.” Like other meme hits, it got played at school events and even on radio as a curiosity, amplifying the annoyance for those who never wanted to hear “Hatee-hatee-hatee-ho!” again. Ylvis intended it as a parody of EDM pop – ironically achieving exactly the kind of ubiquitous earworm they were spoofing, to the annoyance of millions.
- “Baby Shark” (global viral hit – charted 2018/2019): {See 2016 above.} In 2018, “Baby Shark” finally chomped its way onto international music charts (hitting the US Top 40 by early 2019)theguardian.comtheguardian.com. This was the year the song truly went pandemic, crossing from kids’ YouTube into mainstream consciousness. By the fall of 2018, articles were calling it “the year’s most annoying song” outrighttheguardian.com. Parents everywhere pleaded for mercy from the doo-doo-doos. The Guardian wryly noted how even people who hadn’t heard it knew of it through cultural osmosis – often via tales of toddlers playing it on endless repeat, driving their families madtheguardian.com. Once it hit the radio and dance remixes appeared, there was no escape. “Baby Shark” in 2018 exemplified how a simple children’s ditty could unite millions in shared, head-throbbing frustration. (Fun fact: that year a viral social post joked that the Geneva Convention should classify repeated “Baby Shark” play as a form of torture – a sentiment many a worn-out babysitter would second.)globalnews.catheguardian.com
2023
- “Peaches” – Jack Black (Bowser) (2023): Featured in The Super Mario Bros. Movie, this over-the-top piano ballad sung by Jack Black (in character as Bowser) became a TikTok smash and even entered the Billboard Hot 100. It’s intentionally comedic – Bowser professing love for Princess Peach with a dramatic “Peaches, Peaches, Peaches” chorus. Kids and gamers found it hilarious and catchy; others quickly found it grating (especially if they hadn’t seen the movie context). The chorus is essentially one word (peaches) repeated incessantly with operatic bombast. Within weeks, clips of people annoyed by their friends or children endlessly singing “Peaches” popped up online. Though short, the song’s earworm power is strong. One AV Club article quipped that it’s the kind of novelty song “destined to annoy parents everywhere while their offspring replay it 1,000 times.” It’s a fresh example of how a song can go viral and charm its way into your brain – and then, thanks to sheer overuse, make you never want to hear the word “peach” again (sorry, Georgia).
Some Thoughts on Voice Mode
Let’s move back to the blog post: after delving into the above stuff, Mollick also covers the verbal capabilities of new systems that really change the interface. As he notes, you can talk to these models while you’re in the car, or on a walk, or anywhere that you have the space and time.
I think verbal communication is going to finally make LLM responses “real” to us. And that’s all I have to say about that – for now.
I think all of this is deeply informing to people who want to be practical about LLM use. The deep research function in particular is something people will be learning about as time goes on. How many of us will figure it out before something new comes along that makes it obsolete?
In any case, this was an excellent guide to how to make the most of AI in today’s environment.
AI Research
AI can make anyone rich: Mark Cuban says it could turn ‘just one dude in a basement’ into a trillionaire
And thanks to artificial intelligence, the trend of developing blockbuster companies at home is sure to continue. Mark Cuban, the billionaire former principal owner of the Dallas Mavericks and Shark Tank star, recently said AI could help make the world’s first trillionaire.
“We haven’t seen the best or the craziest of what [AI is] going to be able to do,” Cuban told the High Performance podcast in an episode published June 29. “And not only do I think it’ll create a trillionaire, but it could be just one dude in the basement. That’s how crazy it could be.”
Take OpenAI, for example, which was formed in cofounder and President Greg Brockman’s living room in 2015. OpenAI is currently valued at $300 billion thanks to a $40 billion fund-raising deal in March. Sam Altman, the CEO of OpenAI, is worth at least $2 billion.
While that’s a far cry from AI creating the world’s first trillionaire, a September 2024 report by wealth-tracking service Informa Connect said Tesla CEO Elon Musk is on pace to become a trillionaire by 2027. Musk also has his own AI startup xAI, which has an estimated $80 billion valuation.
“There’s always something bigger and better that’s created by an innovative entrepreneur,” Cuban said. “But AI just dwarfs all that.”
How Mark Cuban uses AI
Cuban appears to firmly believe in the power of AI, and said we’re only in the “preseason” of what the technology can really do.
“As it becomes more advanced—and I’m not saying we’re going to get the Terminator— I’m not saying all of a sudden there’s going to be robots that are smarter than people, like the movie,” he told High Performance. “But we’ll find ways to make our lives better, more interesting, to work better, more effectively.”
Cuban also said he uses AI for nearly everything. He gave an example of how he uses it to track health metrics since he suffers from atrial fibrillation (A-Fib).
He had to track when he was taking medications and working out and used ChatGPT to help him record it. He also asked ChatGPT to let him know if there’s something he writes down that doesn’t seem right or that he should be concerned about.
“And damn if it doesn’t do it,” Cuban said. “Now you’ve got to be careful. It’s like talking to a friend who you think knows a lot about something. You’ve still got to be careful and talk to an expert, but being able to do the process and just having things identified that I might not have known to look for was insane—just insane.”
Cuban touches on one of the larger concerns when it comes to relying on AI for making important health- or life-related decisions. While ChatGPT has served as an accessible means for getting advice, and even therapy, experts have warned the technology still has limitations and makes mistakes.
To combat this, Cuban said he challenges the AI tools he’s using with more questions and does his own research.
“There’s times when I don’t think the answer is what it should be, or I think it’s wrong. And I’ll say, ‘I don’t think you’re right. I need you to show me the evidence of how you got to this decision,’” he explained. “And it’s no different than talking to a friend or a work colleague and just calling bullsh-t.”
And on the hot topic of whether AI will take everyone’s jobs, Cuban said no.
“I can just go on all the sh-t I’m trying with AI, and I’m not here to tell you that it’s going to replace everybody’s job. It won’t,” he predicted. “But the things that you like to do, if you’re creative, innovative, whatever it may be—or you’re just bored—it might end up being the best alternative to boredom.”
-
Funding & Business7 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers7 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions6 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business4 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle
-
Funding & Business7 days ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Jobs & Careers4 days ago
Ilya Sutskever Takes Over as CEO of Safe Superintelligence After Daniel Gross’s Exit
-
Funding & Business4 days ago
Dust hits $6M ARR helping enterprises build AI agents that actually do stuff instead of just talking