AI Insights
2 of Wall Street’s Highest-Flying Artificial Intelligence (AI) Stocks Can Plunge Up to 94%, According to Select Analysts

Key Points
-
Although artificial intelligence (AI) is forecast to boost the global economy by $15.7 trillion come 2030, optimism isn’t universal among Wall Street analysts.
-
One skyrocketing AI stock, with a well-defined, sustainable moat, is trading at an unjustifiable valuation premium, based on what history tells us.
-
Meanwhile, one of Wall Street’s elite trillion-dollar stocks has a terrible habit of overpromising and underdelivering.
Arguably, nothing has commanded the attention of professional and everyday investors quite like artificial intelligence (AI). In Sizing the Prize, the analysts at PwC forecast AI would provide a $15.7 trillion boost to the global economy by 2030, with $6.6 trillion tied to productivity improvements, and the remainder coming from consumption-side effects.
Excitement surrounding this technology has sent some of the market’s largest and widely held AI stocks soaring, including AI-data mining specialist Palantir Technologies(NASDAQ: PLTR) and electric-vehicle (EV) manufacturer Tesla(NASDAQ: TSLA).
Where to invest $1,000 right now? Our analyst team just revealed what they believe are the 10 best stocks to buy right now. Learn More »
But just because these stocks have been (thus far) unstoppable, it doesn’t mean optimism is universal among analysts. Two Wall Street analysts who are respective longtime bears of Palantir and Tesla stock believe both companies will lose most of their value.
Image source: Getty Images.
1. Palantir Technologies: Implied downside of 72%
There’s a solid argument to be made that Palantir has been the hottest AI stock on the planet since 2023 began. Shares have rallied approximately 2,370%, with Palantir adding more than $360 billion in market value, as of the closing bell on Aug. 22.
Both of the company’s core operating segments, Gotham and Foundry, lean on AI and machine learning. Gotham is Palantir’s breadwinner. It’s used by federal governments to plan and execute military missions, as well as to collect/analyze data. Meanwhile, Foundry is an enterprise subscription service that helps businesses better understand their data and streamline their operations. Neither operating segment has a clear replacement at scale, which means Palantir offers a sustainable moat.
But in spite of Palantir’s competitive edge, RBC Capital Markets’ Rishi Jaluria sees plenty of downsides to come. Even though Jaluria raised his price target on Palantir shares for a second time since 2025 began, his $45 target implies downside of up to 72% over the next year.
If there’s one headwind Jaluria consistently presents when assigning or reiterating a price target on Palantir, it’s the company’s aggressive valuation. Shares closed out the previous week at a price-to-sales (P/S) multiple of roughly 117!
Historically, companies that are leaders of next-big-thing technology trends have peaked at P/S ratios of approximately 30 to 40. No megacap company has ever been able to maintain such an aggressive P/S premium. While Palantir’s sustainable moat has demonstrated it’s worthy of a pricing premium, there’s a limit as to how far this valuation can be stretched.
Jaluria has also previously cautioned that Foundry’s growth isn’t all it’s cracked up to be. Specifically, Jaluria has opined that Foundry’s tailored approach to meeting its customers’ needs will make scaling the platform a challenge. Nevertheless, Palantir’s commercial customer count surged 48% to 692 clients in the June-ended quarter from the prior-year period, which appears to be proving RBC Capital’s analyst wrong.
There’s also the possibility of Palantir stock being weighed down if the AI bubble were to burst. History tells us that every next-big-thing trend dating back three decades has undergone a bubble-bursting event early in its expansion. While Palantir’s multiyear government contracts and subscription revenue would protect it from an immediate sales decline, investor sentiment would probably clobber its stock.

Image source: Tesla.
2. Tesla: Implied downside of 94%
Over the trailing-six-year period, shares of Tesla have skyrocketed by more than 2,200%. Though Tesla hasn’t moved in lockstep with other leading AI stocks, its EVs are increasingly reliant on AI to improve safety and/or promote partial self-driving functionality.
Tesla was the first automaker in more than a half-decade to successfully build itself from the ground up to mass production. It’s produced a generally accepted accounting principles (GAAP) profit in each of the last five years, and it delivered in the neighborhood of 1.8 million EVs in each of the previous two years.
In spite of Tesla’s success and it becoming one of only 11 public companies globally to have ever reached the $1 trillion valuation mark, Gordon Johnson of GLJ Research sees this stock eventually losing most of its value. Earlier this year, Johnson reduced his price target on Tesla to just $19.05 per share, which implies an up to 94% collapse.
Among the many concerns cited by Johnson is Tesla’s operating structure. Whereas other members of the “Magnificent Seven” are powered by high-margin software sales, Tesla is predominantly selling hardware that affords it less in the way of pricing power. Tesla has slashed the price of its EV fleet on more than a half-dozen occasions over the last three years as competition has ramped up.
Johnson has also been critical of Tesla’s numerous side projects, which are providing minimal value to the brand. Although energy generation and storage products have been a solid addition, the company’s Optimus humanoid robots and extremely limited robotaxi service launch have been grossly overhyped.
This builds on a larger point that Tesla CEO Elon Musk has a terrible habit of overpromising and underdelivering when it comes to game-changing innovations at his company. For instance, promises of Level 5 full self-driving have gone nowhere for 11 years, while the launch of the Cybertruck is looking more like a flop than a success.
Furthermore, Tesla’s earnings quality is highly suspect. Though the company has been decisively profitable for five straight years, more than half of its pre-tax income in recent quarters has been traced back to automotive regulatory credits and net interest income earned on its cash. In other words, a majority of Tesla’s pre-tax income derives from unsustainable and non-innovative sources that have nothing to do with its actual operations. Worse yet, President Trump’s flagship tax and spending bill, the “Big, Beautiful Bill” Act, will soon put an end to automotive regulatory credits in the U.S.
What investors are left with is an auto stock valued at north of 200 times trailing-12-month earnings per share (EPS) whose EPS has been declining with consistency for years. While Johnson’s price target appears excessively low, paying over 200 times EPS for a company that consistency underdelivers is a recipe for downside.
Should you invest $1,000 in Palantir Technologies right now?
Before you buy stock in Palantir Technologies, consider this:
The Motley Fool Stock Advisor analyst team just identified what they believe are the 10 best stocks for investors to buy now… and Palantir Technologies wasn’t one of them. The 10 stocks that made the cut could produce monster returns in the coming years.
Consider when Netflix made this list on December 17, 2004… if you invested $1,000 at the time of our recommendation, you’d have $656,895!* Or when Nvidia made this list on April 15, 2005… if you invested $1,000 at the time of our recommendation, you’d have $1,102,148!*
Now, it’s worth noting Stock Advisor’s total average return is 1,062% — a market-crushing outperformance compared to 184% for the S&P 500. Don’t miss out on the latest top 10 list, available when you join Stock Advisor.
*Stock Advisor returns as of August 25, 2025
Sean Williams has no position in any of the stocks mentioned. The Motley Fool has positions in and recommends Palantir Technologies and Tesla. The Motley Fool has a disclosure policy.
AI Insights
LifeGPT: topology-agnostic generative pretrained transformer model for cellular automata

Codes, data, and additional animations/figures are available at https://github.com/lamm-mit/LifeGPT.
Model architecture and hardware information
LifeGPT was constructed in Python using the “x-transformers” library65. The models in this study were trained with a workstation equipped with a high-end CUDA-compatible GPU (RTX A4000, NVidia, Santa Clara, CA, USA) for a total of 50 epochs on a 10,000-sample training set.
Hyperparameters
Hyperparameters were initially selected heuristically for optimal performance, as the GPU primarily used for training (RTX A4000, NVidia, Santa Clara, CA, USA) had 16 GB of VRAM. Unless otherwise stated, all instances of LifeGPT used the following set of hyperparameters during training, as described in Table 1. The batch size was initially set to 20 samples and was decreased to 5 samples for later versions of LifeGPT due to memory limitations encountered when using FCM (see ”Forgetful causal masking (FCM) implementation”).
Datasets
Data generation overview
To generate training sets, validation sets, and testing sets, the same basic strategy was used. First, IC game-states were generated stochastically as a 2D, 32 × 32 numpy arrays. Depending on the exact algorithm used, the generated IC game-states would collectively form either high-entropy or broad-entropy datasets. Next, a custom Life Python class was used to generate the corresponding NGS for every previously generated IC. Lastly, each IC and corresponding NGS were concatenated within a string. Every generated pair was subsequently stored within a dataframe from future retrieval.
Data topology
Transformer models are architected to process data as 1D arrays. Therefore, to teach LifeGPT the rules of a 2D CA algorithm, such as Life, the 2D data from each time slice of the game had to be flattened into a 1D array. In this way, LifeGPT functioned similar to a vision transformer, in which 2D data is flattened into a 1D array within which each entry is a tokenizable image patch26. However, due to the low resolution of the 32 × 32 toroidal grid on which Life was simulated to generate our training, we were able to encode every pixel of each time-slice of the game in a 1D array (as opposed to grouping pixels into patches).
Instruction Tuning
In order to encode the time-progression of the game into the training set, the initial-state and next-state 1D arrays were placed within a prompt string, which was subsequently tokenized to form a vector. Specifically, both 1D arrays were converted to strings and placed within a larger string containing start and end tokens (@ and $, respectively), a task statement, and bracket delimitors (e.g., “@PredictNextState
Tokenization
We employed a byte-level tokenizer that operates on UTF-8 encoded text. UTF-8 is a variable-width character encoding capable of representing every character in the Unicode standard, which allows the tokenizer to process a wide range of scripts, symbols, and special characters uniformly. By converting the text into its byte-level representation, our approach ensures consistent tokenization across different languages and handles out-of-vocabulary words and non-standard text, such as emojis or code, effectively. This method allows for robust and flexible processing of diverse textual data. Tokenization resulted in a vector suitable as input to the embedding layer of the transformer model.
Training set generation
High-entropy IC set generation
High entropy IC game-states were generated by effectively flipping a coin 1024 times to designate the states (0 or 1) on a 32 × 32 grid. When considering the configuration space of a binary 2D array M ∈ {0, 1}32×32, the following formula may be used to describe its Shannon entropy66 (informational entropy):
$$H(M)=-\sum _{x\in \{0,1\}}{p}_{x}{\log }_{2}{p}_{x}$$
(1)
(This is also known as the binary entropy function67) where, px is the probability of finding the value x in the 32 × 32 array M. px is defined as:
$${p}_{x}=\frac{1}{3{2}^{2}}\mathop{\sum }\limits_{i=1}^{32}\mathop{\sum }\limits_{j=1}^{32}{\delta }_{{M}_{ij},x}$$
(2)
where, Mij is an element of M in the ith row and jth column, and \({\delta }_{{M}_{ij},x}\) is the Kronecker delta function, which is equal to 1 if Mij = x and 0 otherwise.
Thus, for a “50–50 coin toss” scenario (\({p}_{0}={p}_{1}=\frac{1}{2}\)), H(M) is at a maximum and is equal to 1 Sh. Moreover, since binary data necessitates the condition p0 + p1 = 1, only one probability value is needed to fully describe the entropy of a given array A. We therefore denote the ordering of a given IC by referring to a single order parameter, η, where η = p1 is always true. When considering the order parameter of a set of ICs, it is important to note that, because IC generation is always a stochastic process, the exact η of any given IC in the set cannot be predicted with certainty. For this reason, we characterize IC sets with the symbol 〈η〉, denoting the expected order parameter.
To generate high-entropy ICs, a binary array was constructed by checking random.random() < 0.5 == True (using the “random” module in Python—see https://python.readthedocs.io/en/latest/library/random.html) to decide each element. If the statement returned true, then the element would be defined as 1, and otherwise, 0. This method resulted in a training set with a binomial, experimentally measured η distribution (Fig. 5A).
Broad-entropy IC set generation
To create a broad-entropy IC set, first, a vector was created representing a set of order parameters ranging from 0 to 1. The length of this vector was set to the desired number of samples in the dataset (10,000 for training, 1000 for validation). This set of order parameters may be thought of as containing different expected probabilities for finding a 1 in an IC.
Then, the same procedure as with the high-entropy IC set was followed, with two exceptions: (1) instead of random.random() < 0.5 == True determining the value of each element in each IC array, random.random() < η == True was the determining equality, and (2) each IC was generated using a unique η from the aforementioned vector (see “Training set generation”). This strategy ensured that the IC set represented a broad range of ordering, from all 0s, to 50–50 0 s and 1 s, to all 1s (Fig. 5B).
Next-game-state generation
NGSs were calculated from IC arrays by applying Life rules assuming a toroidal grid (see the update_grid() function here: game.py).
Reshaping data
To make the handling of training set data easier, the final stage of the training set generator involves reshaping the data into a list of sub-lists, in which each entry in the list contains a sub-list corresponding to a specific IC. Within each unique sub-list, two strings are stored, one corresponding to a flattened IC, and one corresponding to a flattened NGS (see the generate_sets() function here: game.py).
Validation set generation
Validation sets were generated using the same methods in “Training set generation,” as the random.random() function ensures sufficiently random IC generation, ensuring training and validation sets remained entirely independent. Combined with the incredibly large space of possible 32 × 32 binary arrays (232 × 32 ≈ 1.80 × 10308 unique possibilities), this made the likelihood of even a single sample being identical between a 10,000-sample training set and a 1000-sample validation set negligible (see “Learning abilities”). This, in turn, ensured that over the course of model training, training loss and validation loss remained independent of one another.
Testing set generation
A 10-sample testing set was constructed to validate the performance of models during and after training, in a manner other than by inspecting the validation and training losses. Five samples in the testing set were generated stochastically in the same manner as in “Training set generation,” and 5 samples were manually defined to match known periodic and complex patterns found in Life (Fig. 3). NGSs were recursively generated for a total of 10 states (including the IC) per sample, for all 10 samples in the testing set.
Dataset generation for differently sized grids
For datasets (training, validation, testing) for LifeGPT-MultiGrid (see “Learning life on differently sized grids”), the only difference in the procedure was to specify different grid sizes (WG ∈ {2, 4, 8, 16}) during IC generation, and to introduce a padding character (“p”) which was append as many times as needed to ends of each sub-list for those which had grid sizes smaller than the largest specified grid size, such that all sub-lists were the same length.
Forgetful causal masking (FCM) implementation
FCM was implemented using the “x-transformers” library65. FCM was built into this library as part of the AutoregressiveWrapper class by default. FCM was enabled by setting mask_prob to 0.15, which was empirically shown to be effective by Liu et al.68.
FCM involves randomly masking a predetermined percentage of past-tokens during the learning process, in addition to standard causal attention masking. The authors68 argue that this method prevents over-attending to more recent tokens in a given sequence, encouraging attention to tokens in the “distant past.” We implemented FCM into our model, which increased the rate at which model accuracy improved with each epoch. Furthermore, FCM enabled our model to achieve 100% accuracy on our testing set with a sampling temperature of 1.0 in less than 50 epochs, which was previously unattainable when training with a broad-entropy dataset.
Implementing FCM increased the GPU RAM requirements of our LifeGPT, necessitating a decrease in batch size from 20 to 5 samples.
Model development
Training was initially conducted with high-entropy data. Due to the (pseudo)random nature of our training set generation script (see “Training set generation”), and the high number of samples in the training set (10,000), there was some diversity of training data entropy despite use of a static order parameter of (η = 0.5) (Fig. 5A). Nevertheless, observed model accuracy issues for low-entropy ICs prompted the use of broad-entropy datasets (Fig. 5B), which resulted in for improved performance. Later, LifeGPT-MultiGrid (Learning life on differently sized grids) was developed using a modified dataset to show that the LifeGPT framework allowed for simultaneous learning of multiple grid sizes.
Accuracy benchmarking
The testing dataset consisted of 10 flattened 32 × 32 binary arrays, representing initial states in Life, and their resulting iterations in accordance with Life state-transition rules on a toroidal (periodic) grid, numbering one through ten. Depending on the type of model being trained (the number of desired time-step jump predictions), different columns in the testing dataset would be selected as the ground truth. Accuracy at each checkpoint (every 2 epochs, starting with epoch 2) was determined by inputting the task statement (e.g., @PredictNextState
$$A=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{\delta }_{{y}_{i}{\hat{y}}_{i}}$$
(3)
where A is the Accuracy of the model, N is the total number of cell predictions across the testing dataset (N = 32 × 32 × 10 = 10,240 cells for a dataset with ten pairs of 32 × 32 grid game examples), yi is the ground truth value, \({\hat{y}}_{i}\) is the predicted value, and δ is the Kronecker delta function which equals 1 if \({y}_{i}={\hat{y}}_{i}\) and 0 otherwise. An accuracy score was computed once every 2 epochs, for each model sampling temperature in the set 0, 0.25, 0.5, 0.75, 1, starting with epoch 2.
Training set entropy effects experimental procedure
The goal of this experiment was to determine was effect, if any, that the ordering of the ICs making up the training data for LifeGPT would have on accuracy (A), when fed ICs generated with varying expected order parameters (〈η〉IC). We used two versions of LifeGPT; one was trained on high-entropy training data, and the other on broad-entropy training data. Next, a broad-entropy testing set (comprised of 110 samples, each with a (〈η〉IC) value ranging linearly from 0 to 1) was generated in the same manner as the broad-entropy training set. The stochasticity of the IC generation process ensured both broad entropy sets remained independent. Finally, both models were benchmarked on each sample in a manner similar to the method in “Accuracy benchmarking and sampling temperature effects,” the only difference being that A was calculated for each sample in the testing set, as opposed to an average of all samples. Finally, A versus (〈η〉IC) was plotted for both models (see Fig. 4).
Autoregressive loop implementation
The autoregressive loop is simply an implementation of LifeGPT where the model is placed inside a loop, where a portion of its output, corresponding to the NGS, is converted into an input tensor and is fed back into LifeGPT, for a desired number of iterations. As such, the NGS outputs of the previous loop iteration serves as the IC in the next loop iteration. In this way, the autoregressive loop is able to “run” Life in a similar recursive manner as the original algorithm. We ran the autoregressive loop using two versions of LifeGPT trained on the broad-entropy training set: one which stopped training at epoch 16 (chosen due to this version being the earliest instance of A = 1.0) for sampling temperature = 1), and one that continued training until epoch 50, across sampling temperatures 0, 0.25 0.5, 0.75, and 1. We compared the NGSs outputted from our autoregressive loop method with the ground truth NGSs, generated with the Life algorithm, and created animations for all model-sampling temperature combinations, showing the progression of the ground truth Life system, the autoregressive loop-generated NGSs, and the discrepancy between the two.
We also ran the autoregressive loop (and the Life algorithm) for 249 iterations (resulting in 250 game states, including the ICs), using only the epoch 50, sampling temperature = 0 version of LifeGPT due to time and compute constraints, for all 10 samples in the testing set. For each game state, we compared LifeGPT’s predictions to the GT Life algorithm’s output using the metric “Error Rate,” defined as:
$${\rm{Error}}\,{\rm{Rate}}=1-\frac{1}{G}\mathop{\sum }\limits_{i=1}^{G}{\delta }_{{y}_{i}{\hat{y}}_{i}}$$
(4)
where ErrorRate is the fraction of incorrect cells the model, G is the total number of cells comprising each game state (N = 32 × 32 = 1024 cells), yi is the ground truth value, \({\hat{y}}_{i}\) is the predicted value, and δ is the Kronecker delta function.
LifeGPT-multigrid experimental procedure
Accuracy characterization was performance in the same manner as described in “Accuracy benchmarking and sampling temperature effects,” aside from the use of a different testing dataset. A testing set of 100 samples (25 samples per WG for WG ∈ {2, 4, 8, 16}) was created (utilizing broad entropy IC generation). Inference was performed for each sample, and average accuracies were calculated for each 25-sample group in accordance with equation (3).
Use of generative AI
Some Python scripts used for data generation, model training, data processing, and figure generation were written with the assistance of GPT-3.5, GPT-4, and GPT-4o from OpenAI. All scripts generated/edited in this manner were carefully reviewed, validated, and manually corrected, in the case of errors, by an author prior to implementation in our work.
AI Insights
OpenAI Plans India Data Center in Major Stargate Expansion

OpenAI is seeking to build a massive new data center in India that could mark a major step forward in Asia for its Stargate-branded artificial intelligence infrastructure push.
Source link
AI Insights
Scaling Healthcare Operations: Unlocking new capacity with artificial intelligence – Modern Healthcare
-
Tools & Platforms3 weeks ago
Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks
-
Business3 days ago
The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial
-
Ethics & Policy1 month ago
SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية
-
Events & Conferences3 months ago
Journey to 1000 models: Scaling Instagram’s recommendation system
-
Jobs & Careers2 months ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Funding & Business2 months ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Education2 months ago
VEX Robotics launches AI-powered classroom robotics system
-
Podcasts & Talks2 months ago
Happy 4th of July! 🎆 Made with Veo 3 in Gemini
-
Podcasts & Talks2 months ago
OpenAI 🤝 @teamganassi
-
Mergers & Acquisitions2 months ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies