Connect with us

AI Research

Relationships are complicated! An analysis of relationships between datasets on the Web

Published

on


Results

We compare the performance of the four methods on manually annotated ground truth data, then apply the best-performing method to a large corpus of Web datasets in order to understand the prevalence of different provenance relationships between those datasets.

We generated a corpus of dataset metadata by crawling the Web to find pages with schema.org metadata indicating that the page contains a dataset. We then limited the corpus to datasets that have persistent de-referencible identifiers (i.e., a unique code that permanently identifies a digital object, allowing access to it even if the original location or website changes). This corpus includes 2.7 million dataset-metadata entries.

To generate ground truth for training and evaluation, we manually labeled 2,178 dataset pairs. The labelers had access to all metadata fields for these datasets, such as name, description, provider, temporal and spatial coverage, and so on.

We compared the performance of the four different methods — schema.org, heuristics-based, gradient boosted decision trees (GBDT), and T5 — across various dataset relationship categories (detailed breakdown in the paper). The ML methods (GBDT and T5) outperform the heuristics-based approach in identifying dataset relationships. GBDT consistently achieves the highest F1 scores across various categories, with T5 performing similarly well.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Artificial Intelligence at Bayer – Emerj Artificial Intelligence Research

Published

on


Bayer is a global life sciences company operating across Pharmaceuticals, Consumer Health, and Crop Science. In fiscal 2024, the group reported €46.6 billion in sales and 94,081 employees, a scale that makes internal AI deployments consequential for workflow change and ROI.

The company invests heavily in research, with more than €6 billion allocated to R&D in 2024, and its leadership frames AI as an enabler for both sustainable agriculture and patient-centric medicine. Bayer’s own materials highlight AI’s role in planning and analyzing clinical trials as well as accelerating crop protection discovery pipelines.

This article examines two mature, internally used applications that convey the central role AI plays in Bayer’s core business goals:

  • Herbicide discovery in crop science: Applying AI to narrow down molecular candidates and identify new modes of action.
  • Clinical trial analytics in pharmaceuticals: Ingesting heterogeneous trial and device data to accelerate compliant analysis.

AI-Assisted Herbicide Discovery

Weed resistance is a mounting global challenge. Farmers in the US and Brazil are facing species resistant to multiple herbicide classes, driving up costs and threatening crop yields. Traditional herbicide discovery is slow — often 12 to 15 years from concept to market — and expensive, with high attrition during early screening.

Bayer’s Crop Science division has turned to AI to help shorten these timelines. Independent reporting notes Bayer’s pipeline includes Icafolin, its first new herbicide mode of action in decades, expected to launch in Brazil in 2028, with AI used upstream to accelerate the discovery of new modes of action.

Reuters reports that Bayer’s approach uses AI to match weed protein structures with candidate molecules, compressing the early discovery funnel by triaging millions of possibilities against pre-determined criteria. Bayer’s CropKey overview describes a profile-driven approach, where candidate molecules are designed to meet safety, efficacy, and environmental requirements from the start.

The company claims that CropKey has already identified more than 30 potential molecular targets and validated over 10 as entirely new modes of action. These figures, while promising, remain claims until independent verification.

For Bayer’s discovery scientists, AI-guided triage changes workflows by:

  • Reducing early-stage wet-lab cycles by focusing on higher-probability matches between proteins and molecules.
  • Integrating safety and environmental criteria into the digital screen, filtering out compounds unlikely to meet regulatory thresholds.
  • Advancing promising molecules sooner, enabling earlier testing and potentially compressing development timelines from 15 years to 10.

Coverage by both Reuters and the Wall Street Journal notes this strategy is expected to reduce attrition and accelerate discovery-to-commercialization timelines.

The CropKey program has been covered by multiple independent outlets, a signal of maturity beyond a single press release. Reuters reports Bayer’s assertion that AI has tripled the number of new modes of action identified in early research compared to a decade ago.

The upcoming Icafolin herbicide, expected for commercial release in 2028, demonstrates that CropKey outputs are making their way into the regulatory pipeline. The presence of both media scrutiny and near-term launch candidates suggests CropKey is among Bayer’s most advanced AI deployments.

Video explaining Bayer’s CropKey process in crop protection discovery. (Source: Bayer)

By focusing AI on high-ROI bottlenecks in research and development, Bayer demonstrates how machine learning can trim low-value screening cycles, advancing only the most promising candidates into experimental trials. At the same time, acceleration figures reported by the company should be treated as claims until they are corroborated across multiple seasons, geographies, and independent trials.

Clinical Trial Analytics Platform (ALYCE)

Pharmaceutical development increasingly relies on complex data streams: electronic health records (EHR), site-based case report forms, patient-reported outcomes, and telemetry from wearables in decentralized trials. Managing this data volume and variety strains traditional data warehouses and slows regulatory reporting.

Bayer developed ALYCE (Advanced Analytics Platform for the Clinical Data Environment) to handle this complexity. In a PHUSE conference presentation, Bayer engineers describe the platform as a way to ingest diverse data, ensure governance, and deliver analytics more quickly while maintaining compliance.

The presentation describes ALYCE’s architecture as using a layered “Bronze/Silver/Gold” data lake approach. An example trial payload included approximately 300,000 files (1.6 TB) for 80 patients, requiring timezone harmonization, device ID mapping, and error handling before data could be standardized to SDTM (Study Data Tabulation Model) formats. Automated pipelines provide lineage, quarantine checks, and notifications. These technical details were presented publicly to peers, reinforcing their credibility beyond internal marketing.

For statisticians and clinical programmers, ALYCE claims to:

  • Standardize ingestion across structured (CRFs), semi-structured (EHR extracts), and unstructured (device telemetry) sources.
  • Automate quality checks through pipelines that reduce manual intervention and free staff up to focus on analysis.
  • Enable earlier insights by preparing analysis-ready datasets faster, shortening the lag between data collection and review.

These objectives are consistent with Bayer’s broader statement that AI is being used to plan and analyze clinical trials safely and efficiently.

PHUSE is a respected industry forum where sponsors share methods with peers, and Bayer’s willingness to disclose technical details indicates ALYCE is in production. While Bayer has not released precise cycle-time savings, its emphasis on elastic storage, regulatory readiness, and speed suggests measurable efficiency gains.

Given the specificity of the presentation — real-world payloads, architecture diagrams, and validation processes — ALYCE appears to be a mature platform actively supporting Bayer’s clinical trial programs.

Screenshot from Bayer’s PHUSE presentation illustrating ALYCE’s automated ELTL pipeline.
(Source: PHUSE)

Bayer’s commitment to ALYCE reflects its broader effort to modernize and scale clinical development. By consolidating varied data streams into a single, automated environment, the company positions itself to shorten study timelines, reduce operational overhead, and accelerate the movement of promising therapies from discovery to patients. This infrastructure also prepares Bayer to expand AI-driven analytics across additional therapeutic areas, supporting long-term competitiveness in a highly regulated industry.

While Bayer has not published specific cycle-time reductions or quantified cost savings tied directly to ALYCE, the company’s willingness to present detailed payload volumes and pipeline architecture at PHUSE indicates that the platform is actively deployed and has undergone peer-level scrutiny. Based on those disclosures and parallels with other pharma AI implementations, reasonable expectations include faster data review cycles, earlier anomaly detection, and improved compliance readiness. These outcomes—though not yet publicly validated—suggest ALYCE is reshaping Bayer’s trial workflows in ways that could yield significant long-term returns.



Source link

Continue Reading

AI Research

The STARD-AI reporting guideline for diagnostic accuracy studies using artificial intelligence

Published

on


  • Reichlin, T. et al. Early diagnosis of myocardial infarction with sensitive cardiac troponin assays. N. Engl. J. Med. 361, 858–867 (2009).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Hawkes, N. Cancer survival data emphasise importance of early diagnosis. BMJ 364, l408 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Neal, R. D. et al. Is increased time to diagnosis and treatment in symptomatic cancer associated with poorer outcomes? Systematic review. Br. J. Cancer 112, S92–S107 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Leifer, B. P. Early diagnosis of Alzheimer’s disease: clinical and economic benefits. J. Am. Geriatr. Soc. 51, S281–S288 (2003).

    Article 
    PubMed 

    Google Scholar
     

  • Crosby, D. et al. Early detection of cancer. Science 375, eaay9040 (2022).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Fleming, K. A. et al. The Lancet Commission on diagnostics: transforming access to diagnostics. Lancet 398, 1997–2050 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Whiting, P. F., Rutjes, A. W., Westwood, M. E. & Mallett, S. A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J. Clin. Epidemiol. 66, 1093–1104 (2013).

    Article 
    PubMed 

    Google Scholar
     

  • Glasziou, P. et al. Reducing waste from incomplete or unusable reports of biomedical research. Lancet 383, 267–276 (2014).

    Article 
    PubMed 

    Google Scholar
     

  • Ioannidis, J. P. et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 383, 166–175 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lijmer, J. G. et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282, 1061–1066 (1999).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Irwig, L., Bossuyt, P., Glasziou, P., Gatsonis, C. & Lijmer, J. Designing studies to ensure that estimates of test accuracy are transferable. BMJ 324, 669–671 (2002).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moons, K. G., van Es, G. A., Deckers, J. W., Habbema, J. D. & Grobbee, D. E. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology 8, 12–17 (1997).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Bossuyt, P. M. et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann. Intern. Med. 138, W1–W12 (2003).

    Article 
    PubMed 

    Google Scholar
     

  • Bossuyt, P. M. et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 351, h5527 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cohen, J. F. et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 6, e012799 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cohen, J. F. et al. STARD for Abstracts: essential items for reporting diagnostic accuracy studies in journal or conference abstracts. BMJ 358, j3751 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Korevaar, D. A. et al. Reporting diagnostic accuracy studies: some improvements after 10 years of STARD. Radiology 274, 781–789 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Korevaar, D. A., van Enst, W. A., Spijker, R., Bossuyt, P. M. & Hooft, L. Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evid. Based Med. 19, 47–54 (2014).

    Article 
    PubMed 

    Google Scholar
     

  • Miao, Z., Humphreys, B. D., McMahon, A. P. & Kim, J. Multi-omics integration in the age of million single-cell data. Nat. Rev. Nephrol. 17, 710–724 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Williamson, E. J. et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 584, 430–436 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Lu, R. et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395, 565–574 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    Article 
    PubMed 

    Google Scholar
     

  • McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Benjamens, S., Dhunnoo, P. & Meskó, B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit. Med. 3, 118 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ 370, m3210 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385, e078378 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tejani, A. S. et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 6, e240300 (2024).

  • Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit.Med. 4, 65 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • McGenity, C. et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. NPJ Digit. Med. 7, 114 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moons, K. G. M., de Groot, J. A. H., Linnet, K., Reitsma, J. B. & Bossuyt, P. M. M. Quantifying the added value of a diagnostic test or marker. Clin. Chem. 58, 1408–1417 (2012).

    Article 
    PubMed 

    Google Scholar
     

  • Bossuyt, P. M. M., Reitsma, J. B., Linnet, K. & Moons, K. G. M. Beyond diagnostic accuracy: the clinical utility of diagnostic tests. Clin. Chem. 58, 1636–1643 (2012).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60–69 (2025).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).

  • The White House. Delivering on the Promise of AI to Improve Health Outcomes. https://bidenwhitehouse.archives.gov/briefing-room/blog/2023/12/14/delivering-on-the-promise-of-ai-to-improve-health-outcomes/ (2023).

  • Coalition for Health AI. Blueprint for Trustworthy AI Implementation Guidance and Assurance for Healthcare. https://www.chai.org/workgroup/responsible-ai/blueprint-for-trustworthy-ai (2023).

  • Guni, A., Varma, P., Zhang, J., Fehervari, M. & Ashrafian, H. Artificial intelligence in surgery: the future is now. Eur. Surg. Res. https://doi.org/10.1159/000536393 (2024).

  • Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Krakowski, I. et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. NPJ Digit. Med. 7, 78 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024).

    Article 

    Google Scholar
     

  • Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Barata, C. et al. A reinforcement learning model for AI-based decision support in skin cancer. Nat. Med. 29, 1941–1946 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Mankowitz, D. J. et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618, 257–263 (2023).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Corso, G., Stark, H., Jegelka, S., Jaakkola, T. & Barzilay, R. Graph neural networks. Nat. Rev. Methods Primers 4, 17 (2024).

    Article 
    CAS 

    Google Scholar
     

  • Li, H. et al. CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection. Nat. Commun. 15, 5997 (2024).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar
     

  • Pahud de Mortanges, A. et al. Orchestrating explainable artificial intelligence for multimodal and longitudinal data in medical imaging. NPJ Digit. Med. 7, 195 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Johri, S. et al. An evaluation framework for clinical use of large language models in patient interaction tasks. Nat. Med. 31, 77–86 (2025).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • EQUATOR Network. Enhancing the QUAlity and Transparency Of health Research. https://www.equator-network.org/

  • Sounderajah, V. et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI Steering Group. Nat. Med. 26, 807–808 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar
     

  • Sounderajah, V. et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open 11, e047709 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     



  • Source link

    Continue Reading

    AI Research

    Pascal AI raises $3.1M to scale autonomous investment research workflows

    Published

    on


    Agentic investment research workflows startup Pascal AI Labs Pvt. Ltd. today announced that it has raised $3.1 million in funding to accelerate product development of its autonomous investment workflows, expand into the U.S. and ink strategic data partnerships.

    Founded in 2024, Pascal AI offers a vertical artificial intelligence platform purpose-built for the investment management industry that enables autonomous, agentic workflows. The company leverages AI to transform how research is conducted, moving well beyond information retrieval to reasoning and acting like a seasoned investor while also prioritizing trust, security and auditability.

    Pascal AI automates the full investment lifecycle, from gathering insights buried in transcripts, filings and internal notes to refreshing models, generating investment memos and performing comparative analyses. The platform’s AI agents extract key performance indicators, surface red flags, draft investor communications and update financial models, all in a matter of minutes rather than hours or days.

    The company says that one of its key differentiators is its AI’s ability to learn from each firm’s proprietary processes and history, enabling it to reason like an experienced analyst rather than a generic AI. The result delivers an accelerated research process, along with a continuously updated, holistic view of exposures and performance.

    On the security side, the platform employs a knowledge graph to ensure that all actions are fully auditable and traceable. Added role-based permissions and support for on-premises deployment also allow high-stakes institutions to confidently rely on its autonomous workflows.

    “The future of investment management is autonomous investment research,” said co-founder and Chief Executive Vibhav Viswanathan. “Pascal AI is systematically automating complex investment workflows with the long-term vision of creating a fully autonomous investment research company.”

    While still relatively young, Pascal AI is already finding success, with its platform deployed by more than 25 financial firms across the U.S. and the Asia-Pacific region, including $2 billion private equity funds and a top-three global asset manager with more than $1 trillion in assets under management.

    Kalaari Capital Advisors Pvt. Ltd. led the seed round, with Norwest Venture Partners LP, Info Edge Ventures Pvt. Ltd., Antler Global Ltd. and leading angel investors also participating.

    “At Kalaari, we believe the next decade will see a decisive shift toward autonomous research platforms that can scale human judgment with machine intelligence,” said Sampath P, a partner at Kalaari Capital. “Pascal AI is at the forefront of this transformation — building secure, auditable and truly agentic workflows that don’t just process information but reason like an investor.”

    Image: Pascal AI

    Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

    • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
    • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

    About SiliconANGLE Media

    SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

    Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.



    Source link

    Continue Reading

    Trending