Connect with us

AI Research

AI Algorithms Now Capable of Predicting Drug-Biological Target Interactions to Streamline Pharmaceutical Research – geneonline.com

Published

on

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Instagram wrongly says some users breached child sex abuse rules

Published

on


Graham Fraser

Technology Reporter

Getty Images The logos of Instagram and FacebookGetty Images

Instagram users have told the BBC of the “extreme stress” of having their accounts banned after being wrongly accused by the platform of breaching its rules on child sexual exploitation.

The BBC has been in touch with three people who were told by parent company Meta that their accounts were being permanently disabled, only to have them reinstated shortly after their cases were highlighted to journalists.

“I’ve lost endless hours of sleep, felt isolated. It’s been horrible, not to mention having an accusation like that over my head,” one of the men told BBC News.

Meta declined to comment.

BBC News has been contacted by more than 100 people who claim to have been wrongly banned by Meta.

Some talk of a loss of earnings after being locked out of their business pages, while others highlight the pain of no longer having access to years of pictures and memories. Many point to the impact it has had on their mental health.

Over 27,000 people have signed a petition that accuses Meta’s moderation system, powered by artificial intelligence (AI), of falsely banning accounts and then having an appeal process that is unfit for purpose.

Thousands of people are also in Reddit forums dedicated to the subject, and many users have posted on social media about being banned.

Meta has previously acknowledged a problem with Facebook Groups but denied its platforms were more widely affected.

‘Outrageous and vile’

The BBC has changed the names of the people in this piece to protect their identities.

David, from Aberdeen in Scotland, was suspended from Instagram on 4 June. He was told he had not followed Meta’s community standards on child sexual exploitation, abuse and nudity.

He appealed that day, and was then permanently disabled on Instagram and his associated Facebook and Facebook Messenger accounts.

David found a Reddit thread, where many others were posting that they had also been wrongly banned over child sexual exploitation.

“We have lost years of memories, in my case over 10 years of messages, photos and posts – due to a completely outrageous and vile accusation,” he told BBC News.

He said Meta was “an embarrassment”, with AI-generated replies and templated responses to his questions. He still has no idea why his account was banned.

“I’ve lost endless hours of sleep, extreme stress, felt isolated. It’s been horrible, not to mention having an accusation like that over my head.

“Although you can speak to people on Reddit, it is hard to go and speak to a family member or a colleague. They probably don’t know the context that there is a ban wave going on.”

The BBC raised David’s case to Meta on 3 July, as one of a number of people who claimed to have been wrongly banned over child sexual exploitation. Within hours, his account was reinstated.

In a message sent to David, and seen by the BBC, the tech giant said: “We’re sorry that we’ve got this wrong, and that you weren’t able to use Instagram for a while. Sometimes, we need to take action to help keep our community safe.”

“It is a massive weight off my shoulders,” said David.

Faisal was banned from Instagram on 6 June over alleged child sexual exploitation and, like David, found his Facebook account suspended too.

The student from London is embarking on a career in the creative arts, and was starting to earn money via commissions on his Instagram page when it was suspended. He appealed after feeling he had done nothing wrong, and then his account was then banned a few minutes later.

He told BBC News: “I don’t know what to do and I’m really upset.

“[Meta] falsely accuse me of a crime that I have never done, which also damages my mental state and health and it has put me into pure isolation throughout the past month.”

His case was also raised with Meta by the BBC on 3 July. About five hours later, his accounts were reinstated. He received the exact same email as David, with the apology from Meta.

He told BBC News he was “quite relieved” after hearing the news. “I am trying to limit my time on Instagram now.”

Faisal said he remained upset over the incident, and is now worried the account ban might come up if any background checks are made on him.

A third user Salim told BBC News that he also had accounts falsely banned for child sexual exploitation violations.

He highlighted his case to journalists, stating that appeals are “largely ignored”, business accounts were being affected, and AI was “labelling ordinary people as criminal abusers”.

Almost a week after he was banned, his Instagram and Facebook accounts were reinstated.

What’s gone wrong?

When asked by BBC News, Meta declined to comment on the cases of David, Faisal, and Salim, and did not answer questions about whether it had a problem with wrongly accusing users of child abuse offences.

It seems in one part of the world, however, it has acknowledged there is a wider issue.

The BBC has learned that the chair of the Science, ICT, Broadcasting, and Communications Committee at the National Assembly in South Korea, said last month that Meta had acknowledged the possibility of wrongful suspensions for people in her country.

Dr Carolina Are, a blogger and researcher at Northumbria University into social media moderation, said it was hard to know what the root of the problem was because Meta was not being open about it.

However, she suggested it could be due to recent changes to the wording of some its community guidelines and an ongoing lack of a workable appeal process.

“Meta often don’t explain what it is that triggered the deletion. We are not privy to what went wrong with the algorithm,” she told BBC News.

In a previous statement, Meta said: “We take action on accounts that violate our policies, and people can appeal if they think we’ve made a mistake.”

Meta, in common with all big technology firms, have come under increased pressure in recent years from regulators and authorities to make their platforms safe spaces.

Meta told the BBC it used a combination of people and technology to find and remove accounts that broke its rules, and was not aware of a spike in erroneous account suspension.

Meta says its child sexual exploitation policy relates to children and “non-real depictions with a human likeness”, such as art, content generated by AI or fictional characters.

Meta also told the BBC a few weeks ago it uses technology to identify potentially suspicious behaviours, such as adult accounts being reported by teen accounts, or adults repeatedly searching for “harmful” terms.

Meta states that when it becomes aware of “apparent child exploitation”, it reports it to the National Center for Missing and Exploited Children (NCMEC) in the US. NCMEC told BBC News it makes all of those reports available to law enforcement around the world.

A green promotional banner with black squares and rectangles forming pixels, moving in from the right. The text says: “Tech Decoded: The world’s biggest tech news in your inbox every Monday.”





Source link

Continue Reading

AI Research

A Semiconductor Leader Poised for AI-Driven Growth Despite Near-Term Headwinds

Published

on


The semiconductor industry is at a pivotal juncture, fueled by explosive demand for advanced chips powering artificial intelligence (AI), 5G, and high-performance computing. At the heart of this revolution is Lam Research (LRCX), a leader in semiconductor equipment that stands to benefit from secular tailwinds—even as geopolitical risks cloud near-term visibility. This article examines whether LRCX’s valuation, earnings momentum, and strategic positioning justify a buy rating despite a cautious Zacks Rank.

Valuation: Undervalued PEG Ratio Signals Opportunity

Lam Research’s PEG ratio of 1.24 (as of July 2025) remains below both the semiconductor equipment industry average of 1.55 and the broader Electronics-Semiconductors sector’s average of 1.59. This metric, calculated by dividing the P/E ratio by the 5-year EBITDA growth rate, suggests LRCX is trading at a discount to its growth prospects.

The PEG ratio’s allure lies in its dual consideration of valuation and growth. A ratio under 1.5 typically indicates undervaluation, and LRCX’s 1.24 places it squarely in this category. Even if we use the industry average cited in earlier research (2.09), LRCX’s PEG remains compelling. This discount is puzzling given its dominant market share (15% of global wafer fabrication equipment, or WFE) and its role in critical technologies like atomic layer deposition (ALD), essential for AI chip production.

Earnings Momentum: Positive Revisions Amid Industry Growth

Lam’s earnings revisions tell a story of resilience. Despite macroeconomic headwinds, analysts have raised fiscal 2025 EPS estimates to $4.00, a 5% increase from 2024 levels. This upward momentum aligns with LRCX’s 48% year-over-year (YoY) earnings growth projection for Q2 2025.

The semiconductor equipment sector is a prime beneficiary of AI’s rise. AI chips require advanced nodes (e.g., 3nm and below), demanding cutting-edge equipment like LRCX’s etch and deposition tools. This structural demand, paired with rising WFE spending (expected to hit $130 billion by 2027), positions LRCX for sustained growth.

The Zacks Rank Dilemma: Why Hold Doesn’t Tell the Full Story

Lam Research’s Zacks Rank #4 (Sell) as of July 2025 reflects near-term risks, including:
Geopolitical tensions: U.S.-China trade disputes could disrupt LRCX’s China revenue (a major market).
Delayed NAND spending: A slowdown in NAND memory chip investments has dampened short-term demand.

However, the Zacks Rank focuses on 12–24 months of near-term volatility. It underweights long-term catalysts like:
1. AI-driven capex boom: Chipmakers like TSMC and Samsung are ramping up AI-specific foundries, requiring Lam’s tools.
2. Potential China trade thaw: If U.S. sanctions ease, LRCX could regain access to Chinese clients, boosting revenue.

The Rank’s caution is understandable, but investors should separate short-term noise from LRCX’s strong fundamentals:
Forward P/E of 21.6x, below the semiconductor sector’s 35.3x average.
ROE of 53%, reflecting operational efficiency.

Catalysts for a Re-Rating: AI and Geopolitical Shifts

The key catalysts to watch for a valuation rebound are:
1. AI Chip Demand: NVIDIA’s $200 billion AI chip roadmap and Google’s quantum computing investments underscore the need for advanced fabrication tools. LRCX’s ALD systems are critical for these chips.
2. Trade Policy Shifts: A potential easing of U.S.-China trade restrictions could unlock $500 million+ in annual revenue for LRCX.
3. Q3 2025 Earnings: Management’s guidance of $1.00 EPS and $4.65 billion in revenue (both above consensus) could surprise positively.

Risks and Conclusion: A Buy for the Next 12 Months

Lam Research isn’t without risks:
Execution risks: High R&D costs ($1.3 billion annually) could pressure margins.
Macroeconomic slowdown: A recession could delay chip capex.

However, the long-term case for LRCX is too strong to ignore. Its PEG discount, earnings momentum, and strategic position in AI infrastructure justify a buy rating for the next 12 months. Investors should aim for a target price of $110 (25x forward P/E), with upside if China-related risks abate.

In sum, LRCX’s valuation and growth trajectory make it a compelling play on the AI revolution. While near-term headwinds justify caution, the re-rating potential is undeniable.

Investment thesis: Buy LRCX at current levels, with a 12-month price target of $110.
Risk rating: Moderate (geopolitical and macro risks).
Hold for: 12–18 months for valuation expansion.



Source link

Continue Reading

AI Research

What 300GB of AI Research Reveals About the True Limits of “Zero-Shot” Intelligence

Published

on


Authors:

(1) Vishaal Udandarao, Tubingen AI Center, University of Tubingen, University of Cambridge, and equal contribution;

(2) Ameya Prabhu, Tubingen AI Center, University of Tubingen, University of Oxford, and equal contribution;

(3) Adhiraj Ghosh, Tubingen AI Center, University of Tubingen;

(4) Yash Sharma, Tubingen AI Center, University of Tubingen;

(5) Philip H.S. Torr, University of Oxford;

(6) Adel Bibi, University of Oxford;

(7) Samuel Albanie, University of Cambridge and equal advising, order decided by a coin flip;

(8) Matthias Bethge, Tubingen AI Center, University of Tubingen and equal advising, order decided by a coin flip.

Abstract and 1. Introduction

2 Concepts in Pretraining Data and Quantifying Frequency

3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup

3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance

4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data

4.2 Testing Generalization to Purely Synthetic Concept and Data Distributions

5 Additional Insights from Pretraining Concept Frequencies

6 Testing the Tail: Let It Wag!

7 Related Work

8 Conclusions and Open Problems, Acknowledgements, and References

Part I

Appendix

A. Concept Frequency is Predictive of Performance Across Prompting Strategies

B. Concept Frequency is Predictive of Performance Across Retrieval Metrics

C. Concept Frequency is Predictive of Performance for T2I Models

D. Concept Frequency is Predictive of Performance across Concepts only from Image and Text Domains

E. Experimental Details

F. Why and How Do We Use RAM++?

G. Details about Misalignment Degree Results

H. T2I Models: Evaluation

I. Classification Results: Let It Wag!

Abstract

Web-crawled pretraining datasets underlie the impressive “zero-shot” evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of “zero-shot” generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during “zero-shot” evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets?

We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets [79], and testing on purely synthetic data distributions [51]. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the Let it Wag! benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to “zero-shot” generalization capabilities under large-scale training paradigms remains to be found.

1 Introduction

Multimodal models like CLIP [91] and Stable Diffusion [96] have revolutionized performance on downstream tasks—CLIP is now the de-facto standard for “zero-shot” image recognition [133, 72, 126, 48, 132] and imagetext retrieval [46, 64, 24, 117, 129], while Stable Diffusion is now the de-facto standard for “zero-shot” text-to-image (T2I) generation [93, 17, 96, 41]. In this work, we investigate this empirical success through the lens of zero-shot generalization [69], which refers to the ability of the model to apply its learned knowledge to new unseen concepts. Accordingly, we ask: Are current multimodal models truly capable of “zero-shot” generalization?

To address this, we conducted a comparative analysis involving two main factors: (1) the performance of models across various downstream tasks and (2) the frequency of test concepts within their pretraining datasets. We compiled a comprehensive list of 4, 029 concepts[1] from 27 downstream tasks spanning classification, retrieval, and image generation, assessing the performance against these concepts. Our analysis spanned five large-scale pretraining datasets with different scales, data curation methods and sources (CC-3M [107], CC-12M [27], YFCC-15M [113], LAION-Aesthetics [103], LAION-400M [102]), and evaluated the performance of 10 CLIP models and 24 T2I models, spanning different architectures and parameter scales. We consistently find across all our experiments that, across concepts, the frequency of a concept in the pretraining dataset is a strong predictor of the model’s performance on test examples containing that concept. Notably, model performance scales linearly as the concept frequency in pretraining data grows exponentially i.e., we observe a consistent log-linear scaling trend. We find that this log-linear trend is robust to controlling for correlated factors (similar samples in pretraining and test data [79]) and testing across different concept distributions along with samples generated entirely synthetically [51].

Our findings indicate that the impressive empirical performance of multimodal models like CLIP and Stable Diffusion can be largely attributed to the presence of test concepts within their vast pretraining datasets, thus their reported empirical performance does not constitute “zero-shot” generalization. Quite the contrary, these models require exponentially more data on a concept to linearly improve their performance on tasks pertaining to that concept, highlighting extreme sample inefficiency.

In our analysis, we additionally document the distribution of concepts encountered in pretraining data and find that:

• Concept Distribution: Across all pretraining datasets, the distribution of concepts is long-tailed (see Fig. 5 in Sec. 5), which indicates that a large fraction of concepts are rare. However, given the extreme sample inefficiency observed, what is rare is not properly learned during multimodal pretraining.

• Concept Correlation across Pretraining Datasets: The distribution of concepts across different pretraining datasets are strongly correlated (see Tab. 4 in Sec. 5), which suggests web crawls yield surprisingly similar concept distributions across different pretraining data curation strategies, necessitating explicit rebalancing efforts [11, 125].

• Image-Text Misalignment between Concepts in Pretraining Data: Concepts often appear in one modality but not the other, which implies significant misalignment (see Tab. 3 in Sec. 5). Our released data artifacts can help image-text alignment efforts at scale by precisely indicating the examples in which modalities misalign. Note that the log-linear trend across both modalities is robust to this misalignment.

To provide a simple benchmark for generalization performance for multimodal models, which controls for the concept frequency in the training set, we introduce a new long-tailed test dataset called “Let It Wag!”. Current models trained on both openly available datasets (e.g., LAION-2B [103], DataComp-1B [46]) and closed-source datasets (e.g., OpenAI-WIT [91], WebLI [29]) have significant drops in performance, providing evidence that our observations may also transfer to closed-source datasets. We publicly release all our data artifacts (over 300GB), amortising the cost of analyzing the pretraining datasets of multimodal foundation models for a more data-centric understanding of the properties of multimodal models in the future.

Several prior works [91, 46, 82, 42, 83, 74] have investigated the role of pretraining data in affecting performance. Mayilvahanan et al. [79] showed that CLIP’s performance is correlated with the similarity between training and test datasets. In other studies on specific areas like question-answering [62] and numerical reasoning [94] in large language models, high train-test set similarity did not fully account for observed performance levels [127]. Our comprehensive analysis of several pretraining image-text datasets significantly adds to this line of work, by (1) showing that concept frequency determines zero-shot performance and (2) pinpointing the exponential need for training data as a fundamental issue for current large-scale multimodal models. We conclude that the key to “zero-shot” generalization capabilities under large-scale training paradigms remains to be found.

[1] class categories for classification tasks, objects in the text captions for retrieval tasks, and objects in the text prompts for generation tasks, see Sec. 2 for more details on how we define concepts.



Source link

Continue Reading

Trending