AI Research
Multi-turn conversations with Action-Based Contrastive Self-Training
Are action-based preferences necessary? One of the key factors of ACT is that the contrastive pairs highlight differences between conversational actions. In “ACT w/ Random Actions”, we additionally examine the importance of action selection by randomly sampling both the winning and losing action when constructing the preference pair, and observe this underperforms normal ACT.
Do we need on-policy sampling? In “ACT w/o on-policy sampling”, we examine the importance of on-policy sampling by evaluating normal off-policy DPO on the dataset as constructed in Phase 1. While we do observe some improvements over SFT (e.g., from 69.0 to 74.8 Macro F1), the overall improvements are much larger when using on-policy sampling as with full ACT. This may be due to the fact that the off-policy negative responses are not guaranteed to lie in the language manifold of the policy model, and distribution shift may be too difficult to overcome with off-policy learning.
Is trajectory simulation necessary? ACT is better-aligned with multi-turn conversations due to its trajectory simulation. Without multi-turn simulation, our approach can be viewed similarly to on-policy DPO variants like IRPO, but with a conversation-specific reward signal which accounts for conversation actions and task heuristics. In “ACT w/ sampling w/o simulation”, we find that this trajectory-level simulation is critical to improving multi-turn performance, especially the policy model’s ability to reason about its own clarification questions.
Is ACT model agnostic? The base model in our main experiments, Zephyr, is obtained by aligning Mistral. In “ACT with unaligned foundation models” we observe a performance gap of 6.5 Action F1 and 4.3 Trajectory F1 after ACT tuning for the two models. However, our results demonstrate ACT can improve performance regardless of pre-existing alignment with human feedback, although it can help as an improved model initialization. Overall, we find that improving base model performance with ACT is model agnostic.
AI Research
The new frontier of medical malpractice
Although the beginnings of modern artificial intelligence (AI) can be traced
as far back as 1956, modern generative AI, the most famous example of which is
arguably ChatGPT, only began emerging in 2019. For better or worse, the steady
rise of generative AI has increasingly impacted the medical field. At this time, AI has begun to advance in a way that creates
potential liability…
AI Research
Pharmaceutical Innovation Rises as Global Funding Surges and AI Reshapes Clinical Research – geneonline.com
AI Research
Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review – Cureus
-
Funding & Business7 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers7 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions7 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business4 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle
-
Jobs & Careers6 days ago
Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure
-
Funding & Business1 week ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Funding & Business4 days ago
Dust hits $6M ARR helping enterprises build AI agents that actually do stuff instead of just talking