AI Insights
Effective cross-lingual LLM evaluation with Amazon Bedrock
Evaluating the quality of AI responses across multiple languages presents significant challenges for organizations deploying generative AI solutions globally. How can you maintain consistent performance when human evaluations require substantial resources, especially across diverse languages? Many companies find themselves struggling to scale their evaluation processes without compromising quality or breaking their budgets.
Amazon Bedrock Evaluations offers an efficient solution through its LLM-as-a-judge capability, so you can assess AI outputs consistently across linguistic barriers. This approach reduces the time and resources typically required for multilingual evaluations while maintaining high-quality standards.
In this post, we demonstrate how to use the evaluation features of Amazon Bedrock to deliver reliable results across language barriers without the need for localized prompts or custom infrastructure. Through comprehensive testing and analysis, we share practical strategies to help reduce the cost and complexity of multilingual evaluation while maintaining high standards across global large language model (LLM) deployments.
Solution overview
To scale and streamline the evaluation process, we used Amazon Bedrock Evaluations, which offers both automatic and human-based methods for assessing model and RAG system quality. To learn more, see Evaluate the performance of Amazon Bedrock resources.
Automatic evaluations
Amazon Bedrock supports two modes of automatic evaluation:
For LLM-as-a-judge evaluations, you can choose from a set of built-in metrics or define your own custom metrics tailored to your specific use case. You can run these evaluations on models hosted in Amazon Bedrock or on external models by uploading your own prompt-response pairs.
Human evaluations
For use cases that require subject-matter expert judgment, Amazon Bedrock also supports human evaluation jobs. You can assign evaluations to human experts, and Amazon Bedrock manages task distribution, scoring, and result aggregation.
Human evaluations are especially valuable for establishing a baseline against which automated scores, like those from judge model evaluations, can be compared.
Evaluation dataset preparation
We used the Indonesian splits from the SEA-MTBench dataset. It is based on MT-Bench, a widely used benchmark for conversational AI assessment. The Indonesian version was manually translated by native speakers and consisted of 58 records covering a diverse range of categories such as math, reasoning, and writing.
We converted multi-turn conversations into single-turn interactions while preserving context. This allows each turn to be evaluated independently with consistent context. This conversion process resulted in 116 records for evaluation. Here’s how we approached this conversion:
For each record, we generated responses using a stronger LLM (Model Strong-A) and a relatively weaker LLM (Model Weak-A). These outputs were later evaluated by both human annotators and LLM judges.
Establishing a human evaluation baseline
To assess evaluation quality, we first established a set of human evaluations as the baseline for comparing LLM-as-a-judge scores. A native-speaking evaluator rated each response from Model Strong-A and Model Weak-A on a 1–5 Likert helpfulness scale, using the same rubric applied in our LLM evaluator prompts.
We conducted manual evaluations on the full evaluation dataset using the human evaluation feature in Amazon Bedrock. Setting up human evaluations in Amazon Bedrock is straightforward: you upload a dataset and define the worker group, and Amazon Bedrock automatically generates the annotation UI and manages the scoring workflow and result aggregation.
The following screenshot shows a sample result from an Amazon Bedrock human evaluation job.
LLM-as-a-judge evaluation setup
We evaluated responses from Model Strong-A and Model Weak-A using four judge models: Model Strong-A, Model Strong-B, Model Weak-A, and Model Weak-B. These evaluations were run using custom metrics in an LLM-as-a-judge evaluation in Amazon Bedrock, which allows flexible prompt definition and scoring without the need to manage your own infrastructure.
Each judge model was given a custom evaluation prompt aligned with the same helpfulness rubric used in the human evaluation. The prompt asked the evaluator to rate each response on a 1–5 Likert scale based on clarity, task completion, instruction adherence, and factual accuracy. We prepared both English and Indonesian versions to support multilingual testing. The following table compares the English and Indonesian prompts.
English prompt | Indonesian prompt |
To measure alignment, we used two standard metrics:
- Pearson correlation – Measures the linear relationship between score values. Useful for detecting overall similarity in score trends.
- Cohen’s kappa (linear weighted) – Captures agreement between evaluators, adjusted for chance. Especially useful for discrete scales like Likert scores.
Alignment between LLM judges and human evaluations
We began by comparing the average helpfulness scores given by each evaluator using the English judge prompt. The following chart shows the evaluation results.
When evaluating responses from the stronger model, LLM judges tended to agree with human ratings. But on responses from the weaker model, most LLMs gave noticeably higher scores than humans. This suggests that LLM judges tend to be more generous when response quality is lower.
We designed the evaluation prompt to guide models toward scoring behavior similar to human annotators, but score patterns still showed signs of potential bias. Model Strong-A rated its own outputs highly (4.93), whereas Model Weak-A gave its own responses a higher score than humans did. In contrast, Model Strong-B, which didn’t evaluate its own outputs, gave scores that were closer to human ratings.
To better understand alignment between LLM judges and human preferences, we analyzed Pearson and Cohen’s kappa correlations between them. On responses from Model Weak-A, alignment was strong. Model Strong-A and Model Strong-B achieved Pearson correlations of 0.45 and 0.61, with kappa scores of 0.33 and 0.4.
LLM judges and human alignment on responses from Model Strong-A was more moderate. All evaluators had Pearson correlations between 0.26–0.33 and weighted Kappa scores between 0.2–0.22. This might be due to limited variation in either human or model scores, which reduces the ability to detect strong correlation patterns.
To complete our analysis, we also conducted a qualitative deep dive. Amazon Bedrock makes this straightforward by providing JSONL outputs from each LLM-as-a-judge run that include both the evaluation score and the model’s reasoning. This helped us review evaluator justifications and identify cases where scores were incorrectly extracted or parsed.
From this review, we identified several factors behind the misalignment between LLM and human judgments:
- Evaluator capability ceiling – In some cases, especially in reasoning tasks, the LLM evaluator couldn’t solve the original task itself. This made its evaluations flawed and unreliable at identifying whether a response was correct.
- Evaluation hallucination – In other cases, the LLM evaluator assigned low scores to correct answers not because of reasoning failure, but because it imagined errors or flawed logic in responses that were actually valid.
- Overriding instructions – Certain models occasionally overrode explicit instructions based on ethical judgment. For example, two evaluator models rated a response that created misleading political campaign content as very unhelpful (even though the response included its own warnings), whereas human evaluators rated it very helpful for following the task.
These problems highlight the importance of using human evaluations as a baseline and performing qualitative deep dives to fully understand LLM-as-a-judge results.
Cross-lingual evaluation capabilities
After analyzing evaluation results from the English judge prompt, we moved to the final step of our analysis: comparing evaluation results between English and Indonesian judge prompts.
We began by comparing overall helpfulness scores and alignment with human ratings. Helpfulness scores remained nearly identical for all models, with most shifts within ±0.05. Alignment with human ratings was also similar: Pearson correlations between human scores and LLM-as-a-judge using Indonesian judge prompts closely matched those using English judge prompts. In statistically meaningful cases, correlation score differences were typically within ±0.1.
To further assess cross-language consistency, we computed Pearson correlation and Cohen’s kappa directly between LLM-as-a-judge evaluation scores generated using English and Indonesian judge prompts on the same response set. The following tables show correlation between scores from Indonesian and English judge prompts for each evaluator LLM, on responses generated by Model Weak-A and Model Strong-A.
The first table summarizes the evaluation of Model Weak-A responses.
Metric | Model Strong-A | Model Strong-B | Model Weak-A | Model Weak-B |
Pearson correlation | 0.73 | 0.79 | 0.64 | 0.64 |
Cohen’s Kappa | 0.59 | 0.69 | 0.42 | 0.49 |
The next table summarizes the evaluation of Model Strong-A responses.
Metric | Model Strong-A | Model Strong-B | Model Weak-A | Model Weak-B |
Pearson correlation | 0.41 | 0.8 | 0.51 | 0.7 |
Cohen’s Kappa | 0.36 | 0.65 | 0.43 | 0.61 |
Correlation between evaluation results from both judge prompt languages was strong across all evaluator models. On average, Pearson correlation was 0.65 and Cohen’s kappa was 0.53 across all models.
We also conducted a qualitative review comparing evaluations from both evaluation prompt languages for Model Strong-A and Model Strong-B. Overall, both models showed consistent reasoning across languages in most cases. However, occasional hallucinated errors or flawed logic occurred at similar rates across both languages (we should note that humans make occasional mistakes as well).
One interesting pattern we observed with one of the stronger evaluator models was that it tended to follow the evaluation prompt more strictly in the Indonesian version. For example, it rated a response as unhelpful when it refused to generate misleading political content, even though the task explicitly asked for it. This behavior differed from the English prompt evaluation. In a few cases, it also assigned a noticeably stricter score compared to the English evaluator prompt even though the reasoning across both languages was similar, better matching how humans typically evaluate.
These results confirm that although prompt translation remains a useful option, it is not required to achieve consistent evaluation. You can rely on English evaluator prompts even for non-English outputs, for example by using Amazon Bedrock LLM-as-a-judge predefined and custom metrics to make multilingual evaluation simpler and more scalable.
Takeaways
The following are key takeaways for building a robust LLM evaluation framework:
- LLM-as-a-judge is a practical evaluation method – It offers faster, cheaper, and scalable assessments while maintaining reasonable judgment quality across languages. This makes it suitable for large-scale deployments.
- Choose a judge model based on practical evaluation needs – Across our experiments, stronger models aligned better with human ratings, especially on weaker outputs. However, even top models can misjudge harder tasks or show self-bias. Use capable, neutral evaluators to facilitate fair comparisons.
- Manual human evaluations remain essential – Human evaluations provide the reference baseline for benchmarking automated scoring and understanding model judgment behavior.
- Prompt design meaningfully shapes evaluator behavior – Aligning your evaluation prompt with how humans actually score improves quality and trust in LLM-based evaluations.
- Translated evaluation prompts are helpful but not required – English evaluator prompts reliably judge non-English responses, especially for evaluator models that support multilingual input.
- Always be ready to deep dive with qualitative analysis – Reviewing evaluation disagreements by hand helps uncover hidden model behaviors and makes sure that statistical metrics tell the full story.
- Simplify your evaluation workflow using Amazon Bedrock evaluation features – Amazon Bedrock built-in human evaluation and LLM-as-a-judge evaluation capabilities simplify iteration and streamline your evaluation workflow.
Conclusion
Through our experiments, we demonstrated that LLM-as-a-judge evaluations can deliver consistent and reliable results across languages, even without prompt translation. With properly designed evaluation prompts, LLMs can maintain high alignment with human ratings regardless of evaluator prompt language. Though we focused on Indonesian, the results indicate similar techniques are likely effective for other non-English languages, but you are encouraged to assess for yourself on any language you choose. This reduces the need to create localized evaluation prompts for every target audience.
To level up your evaluation practices, consider the following ways to extend your approach beyond foundation model scoring:
- Evaluate your Retrieval Augmented Generation (RAG) pipeline, assessing not just LLM responses but also retrieval quality using Amazon Bedrock RAG evaluation capabilities
- Evaluate and monitor continuously, and run evaluations before production launch, during live operation, and ahead of any major system upgrades
Begin your cross-lingual evaluation journey today with Amazon Bedrock Evaluations and scale your AI solutions confidently across global landscapes.
About the authors
Riza Saputra is a Senior Solutions Architect at AWS, working with startups of all stages to help them grow securely, scale efficiently, and innovate faster. His current focus is on generative AI, guiding organizations in building and scaling AI solutions securely and efficiently. With experience across roles, industries, and company sizes, he brings a versatile perspective to solving technical and business challenges. Riza also shares his knowledge through public speaking and content to support the broader tech community.
AI Insights
Global Artificial Intelligence (AI) in Clinical Trials Market
According to DelveInsight’s analysis, The demand for Artificial Intelligence in clinical trials is experiencing strong growth, primarily driven by the rising global prevalence of chronic conditions like diabetes, cardiovascular diseases, respiratory illnesses, and cancer. This growth is further supported by increased investments and funding dedicated to advancing drug discovery and development efforts. Additionally, the growing number of strategic collaborations and partnerships among pharmaceutical, biotechnology, and medical device companies is significantly boosting the adoption of AI-driven solutions in clinical trials. Together, these factors are anticipated to fuel the expansion of the AI in the clinical trials market during the forecast period from 2025 to 2032.
DelveInsight’s “Artificial Intelligence (AI) in Clinical Trials Market Insights, Competitive Landscape and Market Forecast-2032” report provides the current and forecast market outlook, forthcoming device innovation, challenges, market drivers and barriers. The report also covers the major emerging products and key Artificial Intelligence (AI) in Clinical Trials companies actively working in the market.
To know more about why North America is leading the market growth in the Artificial Intelligence (AI) in Clinical Trials market, get a snapshot of the report Artificial Intelligence (AI) in Clinical Trials Market Trends
https://www.delveinsight.com/sample-request/ai-in-clinical-trials-market?utm_source=openpr&utm_medium=pressrelease&utm_campaign=gpr
Artificial Intelligence (AI) in Clinical Trials Overview
Artificial Intelligence (AI) in clinical trials refers to the use of advanced machine learning algorithms and data analytics to streamline and improve various aspects of clinical research. AI enhances trial design, patient recruitment, site selection, and data analysis by identifying patterns and predicting outcomes. It enables faster patient matching, optimizes protocol design, reduces trial timelines, and improves data quality and monitoring. AI also helps in real-time adverse event detection and adaptive trial management, making clinical trials more efficient, cost-effective, and patient-centric.
DelveInsight Analysis: The global Artificial Intelligence in clinical trials market size was valued at USD 1,350.79 million in 2024 and is projected to expand at a CAGR of 12.04% during 2025-2032, reaching approximately USD 3,334.47 million by 2032.
Artificial Intelligence (AI) in Clinical Trials Market Insights
Geographically, North America is expected to lead the AI in the clinical trial market in 2024, driven by several critical factors. The region’s growing burden of chronic diseases, substantial investments in R&D, and the rising volume of clinical trials contribute significantly to this dominance. Additionally, an increasing number of collaborations and partnerships among pharmaceutical and medical device companies, along with the advancement of sophisticated AI solutions, are accelerating market expansion. These developments are enhancing the ability to manage complex clinical trials efficiently, driving the adoption of AI technologies and supporting the market’s growth in North America throughout the forecast period from 2025 to 2032.
To read more about the latest highlights related to Artificial Intelligence (AI) in Clinical Trials, get a snapshot of the key highlights entailed in the Artificial Intelligence (AI) in Clinical Trials
https://www.delveinsight.com/report-store/ai-in-clinical-trials-market?utm_source=openpr&utm_medium=pressrelease&utm_campaign=gpr
Recent Developments in the Artificial Intelligence (AI) in Clinical Trials Market Report
• In May 2025, Avant Technologies, Inc. (OTCQB: AVAI) and joint venture partner Ainnova Tech, Inc. announced the initiation of acquisition discussions aimed at enhancing their presence in the rapidly growing AI-powered healthcare sector.
• In March 2025, Suvoda introduced Sofia, an AI-driven assistant created to optimize clinical trial management processes. Sofia aids study teams by providing quick access to essential trial data and real-time, intelligent insights. This tool boosts operational efficiency, minimizes manual tasks, and helps teams make faster, data-informed decisions throughout the clinical trial journey.
• In December 2024, ConcertAI and NeoGenomics unveiled CTO-H, an advanced AI-powered software platform designed to enhance research analytics, clinical trial design, and operational efficiency. CTO-H provides an extensive research data ecosystem, offering comprehensive longitudinal patient data, deep biomarker insights, and scalable analytics to support more precise, efficient, and data-driven clinical development processes.
• In June 2024, Lokavant introduced SpectrumTM, the first AI-powered clinical trial feasibility solution aimed at enhancing trial performance throughout the clinical development process. Spectrum enables study teams to forecast, control, and improve trial timelines and expenses in real-time.
• Thus, owing to such developments in the market, rapid growth will be observed in the Artificial Intelligence (AI) in Clinical Trials market during the forecast period
Key Players in the Artificial Intelligence (AI) in Clinical Trials Market
Some of the key market players operating in the Artificial Intelligence (AI) in Clinical Trials market include- TEMPUS, NetraMark, ConcertAI, AiCure, Medpace, Inc., ICON plc, Charles River Laboratories, Dassault Systèmes, Oracle, Certara, Cytel Inc., Phesi, DeepHealth, Unlearn.ai, Inc., H1, TrialX, Suvoda LLC, Risklick, Lokavant, Research Solutions, and others.
Which MedTech key players in the Artificial Intelligence (AI) in Clinical Trials market are set to emerge as the trendsetter explore @ Key Artificial Intelligence (AI) in Clinical Trials Companies
https://www.delveinsight.com/sample-request/ai-in-clinical-trials-market?utm_source=openpr&utm_medium=pressrelease&utm_campaign=gpr
Analysis on the Artificial Intelligence (AI) in Clinical Trials Market Landscape
To meet the growing needs of clinical trials, leading companies in the AI in Clinical Trials market are creating advanced AI solutions aimed at improving trial efficiency, optimizing patient recruitment, and enhancing clinical trial design at investigator sites. For example, in April 2023, ConcertAI introduced CTO 2.0, a clinical trial optimization platform that utilizes publicly available data and partner insights to deliver comprehensive site and physician-level trial data. This tool provides key operational metrics and site profiles to evaluate trial performance and site capabilities. Additionally, CTO 2.0 assists sponsors in complying with FDA requirements for inclusive trial outcomes, promoting a shift toward community-based trials with more streamlined and patient-centric designs.
As a result of these advancements, the software segment is projected to experience significant growth throughout the forecast period, contributing to the overall expansion of the AI in the clinical trials market.
Scope of the Artificial Intelligence (AI) in Clinical Trials Market Report
• Coverage: Global
• Study Period: 2022-2032
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By Product Type: Software and Services
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By Technology Type: Machine Learning (ML), Natural Language Processing (NLP), and Others
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By Application Type: Clinical Trial Design & Optimization, Patient Identification & Recruitment, Site Identification & Trial Monitoring, and Others
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By Therapeutic Area: Oncology, Cardiology, Neurology, Infectious Disease, Immunology, and Others
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By End-User: Pharmaceutical & Biotechnology Companies and Medical Device Companies
• Artificial Intelligence (AI) in Clinical Trials Market Segmentation By Geography: North America, Europe, Asia-Pacific, and Rest of the World
• Key Artificial Intelligence (AI) in Clinical Trials Companies: TEMPUS, NetraMark, ConcertAI, AiCure, Medpace, Inc., ICON plc, Charles River Laboratories, Dassault Systèmes, Oracle, Certara, Cytel Inc., Phesi, DeepHealth, Unlearn.ai, Inc., H1, TrialX, Suvoda LLC, Risklick, Lokavant, Research Solutions, and others
• Porter’s Five Forces Analysis, Product Profiles, Case Studies, KOL’s Views, Analyst’s View
Interested in knowing how the Artificial Intelligence (AI) in Clinical Trials market will grow by 2032? Click to get a snapshot of the Artificial Intelligence (AI) in Clinical Trials Market Analysis
https://www.delveinsight.com/sample-request/ai-in-clinical-trials-market?utm_source=openpr&utm_medium=pressrelease&utm_campaign=gpr
Table of Contents
1 Artificial Intelligence (AI) in Clinical Trials Market Report Introduction
2 Artificial Intelligence (AI) in Clinical Trials Market Executive summary
3 Regulatory and Patent Analysis
4 Artificial Intelligence (AI) in Clinical Trials Market Key Factors Analysis
5 Porter’s Five Forces Analysis
6 COVID-19 Impact Analysis on Artificial Intelligence (AI) in Clinical Trials Market
7 Artificial Intelligence (AI) in Clinical Trials Market Layout
8 Global Company Share Analysis – Key Artificial Intelligence (AI) in Clinical Trials Companies
9 Company and Product Profiles
10 Project Approach
11 Artificial Intelligence (AI) in Clinical Trials Market Drivers
12 Artificial Intelligence (AI) in Clinical Trials Market Barriers
13 About DelveInsight
Latest Reports by DelveInsight
• Percutaneous Arterial Closure Device Market: https://www.delveinsight.com/report-store/vascular-closure-devices-market
• Transdermal Drug Delivery Devices: https://www.delveinsight.com/report-store/transdermal-drug-delivery-devices-market
• Infusion Pumps Market: https://www.delveinsight.com/report-store/infusion-pumps-market
• Acute Radiation Syndrome Market: https://www.delveinsight.com/report-store/acute-radiation-syndrome-pipeline-insight
• Human Papillomavirus Hpv Market: https://www.delveinsight.com/report-store/human-papillomavirus-hpv-market
• Blood Gas And Electrolyte Analyzers Market: https://www.delveinsight.com/report-store/blood-gas-and-electrolyte-analyzers-market
Contact Us
Gaurav Bora
info@delveinsight.com
+14699457679
www.delveinsight.com
Connect With Us at:
LinkedIn | Facebook | Twitter
About DelveInsight
DelveInsight is a leading Business Consultant and Market Research firm focused exclusively on life sciences. It supports Pharma companies by providing end-to-end comprehensive solutions to improve their performance.
Get hassle-free access to all the healthcare and pharma market research reports through our subscription-based platform PharmDelve.
This release was published on openPR.
AI Insights
New York Seeks to RAISE the Bar on AI Regulation – Tech & Sourcing @ Morgan Lewis
New York state lawmakers on June 12, 2025 passed the Responsible AI Safety and Education Act (the RAISE Act), which aims to safeguard against artificial intelligence (AI)-driven disaster scenarios by focusing on the largest AI model developers; the bill now heads to the governor’s desk for final approval. The RAISE Act is the latest legislative movement at the state level seeking to regulate AI, a movement that may continue to gain momentum after a 10-year moratorium on AI regulation was removed from the recently passed One Big Beautiful Bill.
Background and Core Provisions
Inspired by California’s SB 1047 bill, which was vetoed by California Governor Gavin Newsom in September 2024 over concerns that it could stifle innovation, the RAISE Act aims to prevent so-called “frontier AI models” from contributing to “critical harm.” For the purposes of the RAISE Act, “critical harm” is defined as events in which AI causes the death or injury of more than 100 people, or more than $1 billion in damages to rights in money or property caused or materially enabled by a large developer’s creation, use, storage, or release of frontier model, through either (1) the creation or use of a chemical, biological, radiological, or nuclear weapon or (2) an artificial intelligence model engaging in conduct that is both (a) done with limited human intervention and (b) would, if committed by a human, constitute a crime specified in the penal law that required intent, recklessness, or gross negligence or the soliciting or aiding and abetting of such crimes.
Unlike SB 1047, which faced criticism for casting too wide a net over general AI systems, the RAISE Act targets only “frontier” models developed by companies that meet both of the following criteria: (1) a training cost threshold where the applicable AI model was trained using more than $100 million in computing resources, or more than $5 million in computing resources where a smaller artificial model was trained on a larger artificial intelligence model and has similar capabilities to the larger artificial intelligence model; and (2) the model is made available to New York residents. To the extent the RAISE Act aligns with similar state-level regulations and restrictions, this would theoretically allow some room for innovation by entities (like startup companies and research organizations) less likely to cause such critical harm.
If a company meets both criteria and is therefore subject to the jurisdiction of the RAISE Act, it will need to comply with all the following before deploying any frontier AI model:
- Implement a written safety and security protocol
- Retain an unredacted version of such safety and security protocol for as long as the frontier model is deployed, plus five years
- Conspicuously publish a copy of the safety and security protocol and transmit such protocol to the division of homeland security and emergency services
- Record information on specific tests and test results used in any assessment of the frontier AI model
From a practical perspective, requirements such as recordation of information on testing of any frontier AI model may push smaller startups and research organizations out of the market to the extent the resources necessary to maintain such information present additional and costly overhead.
Enforcement and Exceptions
The RAISE Act empowers the New York attorney general to levy civil penalties of up to $10 million for initial violations and up to $30 million for subsequent violations by noncompliant covered companies. This includes penalties for violations of a developer’s transparency obligations as specified above or as required elsewhere in the RAISE Act, such as the requirement that covered companies retain an independent auditor annually to review compliance with the law. However, covered companies may make “appropriate redactions” to their safety protocols when necessary to protect public safety, safeguard trade secrets, maintain confidential information as required by law, or protect employee or customer privacy.
Looking Ahead
The bill’s fate remains uncertain. Our team is monitoring developments closely, including potential impacts on commercial contracting, compliance obligations, and technology adoption.
AI Insights
What Is Artificial Intelligence? Explained Simply With Real-Life Examples – The Times of India
-
Funding & Business1 week ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers1 week ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions1 week ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business1 week ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers1 week ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Education2 days ago
9 AI Ethics Scenarios (and What School Librarians Would Do)
-
Education2 days ago
Teachers see online learning as critical for workforce readiness in 2025
-
Education3 days ago
Nursery teachers to get £4,500 to work in disadvantaged areas
-
Education4 days ago
How ChatGPT is breaking higher education, explained
-
Jobs & Careers1 week ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle