AI is pervasive for asset managers. Can it power ETF gains? – Pensions & Investments

Published

4 weeks ago

August 7, 2025

AI is pervasive for asset managers. Can it power ETF gains? Pensions & Investments

Source link

This Artificial Intelligence Stock Plays an Important Role in Chip Production, but It’s Down 14% in 12 Months. Could It Be a Bargain Buy?

Don't Miss

Bernband’s alien landscape is the perfect place for digital wandering – just don’t expect a map | Games

The Editors

Click to comment

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI w..

Published

47 minutes ago

September 1, 2025

The Editors

Builder AI launches liquidation process in Delaware after controversy over sales overestimation, Nate founder’s federal indictment, GameOn false data, etc

[Picture = Gemini]

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI will do everything on its own” are collapsing throughout Silicon Valley. The bankruptcy of Builder AI, which was revered as a unicorn, is a symbolic event.

According to the New York Times on the 31st (local time), Builder AI has launched a massive promotion with high growth in 2024, but a board investigation has confirmed overstatement of sales. After management changes and a liquidity crisis, the company entered liquidation proceedings in Delaware courts in the first half of 2025. As suspicions spread that “people took care of it from behind” over the reality of AI manager Natasha, who said he would automatically make the app, management explained, “AI was an auxiliary tool and did not replace people,” but failed to restore trust.

The incident shows how easily verification of the actual level of automation of technology and financial numbers can be pushed back while the label ‘AI’ draws the attention of investment and media.

Similar scenes were repeated on other stages. The shopping app “Nate” promoted that “Deep Learning replaces payment and checkout,” but allegations arose that the Philippine outsourcing staff handled the order manually. Eventually, the Southern New York Federal Prosecutor’s Office (SDNY) charged its founder with investor fraud in the spring of 2025.

San Francisco startup “Game On” put forward an AI sports chatbot, but was indicted on false financial data, fake audit reports, and allegations of inflated sales. What these events have in common is that they promoted “AI-washing,” that is, processes that are largely performed by humans or low automation maturity, as if they were “completely automatic.”

‘AI done by humans’ is not small in the field of large corporations. Amazon’s “Just Walk Out” was a concept that sensors and computer vision handled automatic payments, but reports continued that personnel identified and inspected transactions in actual operations. Amazon denied the controversy over the exaggeration, but adjusted its store strategy to focus on smart carts.

Presto Automation, which introduced a fast-food drive-through automatic response solution, was also found to have processed a significant percentage of orders at a certain time. Legal technology start-up advocated automating personal injury case documents, but when internal testimony was reported that many of the actual tasks depend on human inspection, the company emphasized that “the combination of AI and humans is essential for high quality.”

“The fall of Builder AI clearly shows what to believe and what to doubt in the current AI boom,” the New York Times said. “As it is said that AI is sold, but automation is not, the gap between the actual level of technology and market expectations is still large.”

Source link

Continue Reading

AI Insights

Can AI bring more good than harm to the future of our jobs? Here’s what the data says

Published
1 hour ago
on
September 1, 2025

By
The Editors

No generation is spared from the cultural upheaval of new technology. In the 2020s, it is AI fuelling that disruption.

Love it or fear it, artificial intelligence offers endless possibilities some people would have now felt on a personal level.

While many are fearing for their jobs, however, others are seeing a widening of possibilities.

Max Hamilton is worried about the impact AI would have on creatives, including copyright issues. (Supplied: Max Hamilton)

Forced to seek change

Max Hamilton, a graphic designer with over two decades of experience, has already adapted her career to meet the threat of generative AI head on.

The increasingly scarce availability of jobs pushed her to venture into illustration work for children’s books.

“I saw that happening a few years ago and that’s when I pivoted,” Ms Hamilton said.

“I’ve been really focusing on using watercolour and hand drawing, which I did on purpose because I thought that might set me apart from having the computer-generated look.“

Authors warn against AI copyright exception

The Productivity Commission’s interim report on AI has triggered a fierce backlash from the literary community.

To stay ahead of the fast-changing landscape, she has also expanded her skill set to include writing, meaning she can be involved in every aspect of producing a book.

“As a creative, I think we like to think that our creativity is our special weapon,” Ms Hamilton said.

How will AI affect jobs?

Data shows that creatives like Ms Hamilton are right to expect increasing AI influence in their sectors.

A recent Jobs and Skills Australia (JSA) report confirms artificial intelligence will bring about an impending change to the labour market, either through automation or augmentation.

The body assessed various tasks within ANZSCO-listed occupations and ranked them based on the degree to which AI could impact them.

Here are the sectors JSA predicts are most likely to be automated by artificial intelligence, with existing workflows replaced.

Here are the sectors most likely to be augmented by artificial intelligence, improving the output of existing workers.

Evan Shellshear, an innovation and technology expert from The University of Queensland, explains what this means for the availability of jobs in the market.

“It’s not jobs that are at risk of AI, it’s actual tasks and skills,” Dr Shellshear said.

“We’re seeing certain skills and parts of jobs disappearing, but not necessarily whole occupations disappearing.“

How AI is reshaping careers

Artificial intelligence is changing the workforce. Industry experts look at the careers most likely to be impacted and what the research reveals about graduate jobs and women.

The report further supports this, saying Generative AI technologies are more likely to help boost workers’ productivity, as opposed to replacing them, especially in high-skilled occupations.

In fact, Dr Shellshear believes there’s a likelihood AI will create job opportunities.

“It’s making a lot of things that were impossible, possible,” he says, especially for small businesses.

“Gen AI can lower the cost for things, expertise and knowledge that were out of reach in the past.”

An opportunity to create the unthinkable

Growing up a big fan of science fiction, Melanie Fisher jumped at the chance to experiment with Generative AI shortly after ChatGPT was released.

Ms Fisher started off by testing the tool’s knowledge of food regulation, with which she was familiar from years of experience in the industry.

“It came up with some untrue stuff so early on I learnt you have to be careful,” said the 67-year old who is based in Canberra.

Melanie Fisher built an app for her 3 year-old grandchild and now it’s a bonding activity for the two (Supplied: Melanie Fisher)

But Ms Fisher continued pushing the bounds of what AI could offer — using it to find new recipes and suggestions for things to do — before landing on the idea to create a game app for her 3-year old granddaughter Lilly*.

When she heard AI could code, she thought to herself, “Oh I’d love to try that, but … I’m not an IT person or anything.”

So, she threw the question to ChatGPT.

Melanie spelled out her request on the generative AI tool and made it clear she had no relevant background. (Supplied: Melanie Fisher)

The tool recommended a program that allowed her to drag and drop different elements to produce a coherent story mode gameplay.

Ms Fisher didn’t have to look far for inspiration.

“[The game was] based on stories Lilly* and I made up about her being a girl pirate with her friends, and they have adventures together,” Ms Fisher said.

It took three weeks of work to bring her vision to reality, even getting the characters to loosely resemble Lilly*.

Now the game has become a special pet project for the duo.

Melanie continues to build on the gameplay with input from her granddaughter.  (Supplied: Melanie Fisher)

Drawing from her own experience, Ms Fisher sees AI as a double-edged sword.

“I think it’s a great leap forward for people, but I do very much worry it’s going to massively displace lots of people from work,” she said.

Transition still in early days

Many professionals such as recruiters, university staff and health practitioners have incorporated AI into their workflows.

More recently, Commonwealth Bank Australia made headlines by slashing jobs due to artificial intelligence, only to later put out an apology while backtracking its decision.

But news stories about corporate lay-offs and downsizing don’t necessarily point to an AI takeover, according to Professor Nicholas Davis. He is a former World Economic Forum executive, who is now an artificial intelligence expert at the University of Technology Sydney (UTS).

He believes these trends are being driven by “early adopters” and foresees “a disconnect between expectation versus reality”.

CBA backtracks on AI job cuts

CBA has apologised to 45 affected employees, after finding customer service roles were not redundant despite introducing an AI-powered “voice-bot”.

“We’re likely to see organisations lay off people in anticipation of gains and then rehiring because it doesn’t quite work the way they expect,” said Professor Davis.

“We’re at very early stages of using the latest forms of AI at the enterprise level.

“Most organisations have yet to see a measurable positive impact on the bottom line.”

An example he provided is how the introduction of self check-out machines at supermarkets resulted in higher levels of staff stress, customer frustration and costs from theft.

This has led a number of UK and US chains to reintroduce manned tills.

“The consumer experience is different to the organisational value and experience,” warns Professor Davis.

Nicholas Davis believes humans are still needed alongside AI for it to perform sustainably and reliably. (ABC News: Ian Cutmore)

How can we better prepare for an AI-driven world?

Despite having success with the app, Ms Fisher says, “I’ve learnt a little bit but I don’t think I could become a game developer.”

Speaking to this, Dr Shellshear agrees there’s a distinction to be made between what is possible with AI and the value humans have to offer.

Dr Evan Shellshear believes people should focus on harnessing the right skills for an AI-driven future. (Supplied: Dr Evan Shellshear)

While AI can help a person attain new skills, they still need education, training and real-world expertise to get to a professional level, he adds.

Having conducted his own research into AI’s impact on jobs with a keen interest in what remains relevant in the future, he found professions involving communication, management, collaboration and creativity, assisting and caring to be most difficult to replicate.

Other human traits such as problem-solving, resilience, attentiveness are also irreplaceable, says Professor Davis.

But he says that having a varied set of skills can put you at an advantage.

“The more you’re able to add value, the less it matters that things get taken away,” he said.

“But if your job is doing one specific thing or creating one style, then there’s where it gets problematic.”

“Embracing, engaging and reinventing is how you benefit.”

Here’s also Dr Shellshear’s advice on staying ahead of the game.

“Recognise its impact on your life as an individual, especially from a job perspective and ask yourself: ‘How do I position myself to continue to add value with these tools around me?'”At some point, you have to learn how to integrate [AI] into your workflows otherwise you [risk no longer being] efficient or relevant.“

*Name changed for privacy

Source link

Continue Reading

AI Insights

Toward faithful and human-aligned self-explanation of deep models

Published
2 hours ago
on
September 1, 2025

By
The Editors

Formulation of logic rule explanations

For given input x from the dataset X, our explanation α = (α_x, α_w) comprises two components: an antecedent α_x and a linear weight α_w. The antecedent α_x and a consequent y together form a logic rule α_x ⇒ y, as illustrated in Fig. 1a. The linear weight α_w indicates the contribution of the atoms in the antecedent α_x within the logic rule. Meanings of symbols used in this paper are defined in Section C.1 of the Supplementary Information.

An antecedent α_x represents the condition under which a rule applies and corresponds to an explanation expressed in logical form. It is defined as a sequence α_x = (o₁,…, o_L), where each o_i is an atom, and L is the length of the sequence. An atom is the smallest unit of explanation and corresponds to a single interpretable feature of a given input—for instance, a condition such as “awesome ≥2”. These interpretable features may differ from the features used by deep learning models. They can have different granularities (e.g., words or phrases vs. tokens), be based on statistical properties (e.g., word frequency), or be derived using external tools (e.g., grammatical tagging of a word). Mathematically, each atom o_i is a Boolean-valued function that returns true if the ith interpretable feature is present in the input x, and false otherwise. Additional details about the atom selection process are provided in Section C.3 of the Supplementary Information. An input sample x is said to satisfy an antecedent α_x if the logical condition α_x(x) evaluates to true.

The consequent y denotes the model’s predicted output, given that the antecedent is satisfied. In a classification task, y typically corresponds to the predicted class label; in regression, y would be a real-valued number.

Finally, the linear weight α_w models the contribution of each atom in the logical relationship α_x ⇒ y. It is represented as a matrix ${{\boldsymbol{\alpha }}}_{w}=\left(\begin{array}{ccc}{w}_{11}&\ldots &{w}_{1L}\\ \ldots &\ldots &\ldots \\ {w}_{K1}&\ldots &{w}_{KL}\end{array}\right)$, where K is the number of possible classes and w_ki indicates the contribution of atom o_i to the prediction of class k. The magnitude of each weight reflects the strength of its corresponding atom’s contribution to the output prediction.

Framework for deep logic rule reasoning

Let us denote f as a deep learning model that estimates probability p(y∣x), where x is the input data sample and y is a candidate class. We upgrade model f to a self-explaining version by adding a logic rule explanation α. Then, we can reformulate p(y∣x) as

$$p({y}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }},{\bf{x}},b)p({\boldsymbol{\alpha }}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }})p({\boldsymbol{\alpha }}| {\bf{x}},b),\quad s.t.,\quad \Omega ({\boldsymbol{\alpha }})\le S$$

(3)

Here, b represents a human’s prior belief about the rules, e.g., the desirable form of atoms, Ω(α) is the required number of logic rules to explain given input x, and S is the number of samples (logic rules chosen by the model). Eq. (3) includes two constraints essential for ensuring explainability. The first constraint p(y∣α, x, b) = p(y∣α) requires that explanation α contains all information in the input x and b that is useful to predict y. Without the constraint, the model may “cheat” by predicting y directly from the input instead of using the explanation, which leads to a decrease of faithfulness. The second constraint Ω(α) ≤ S requires that the model can be well explained by using only S explanations, where S is small enough to ensure readability (S = 1 in our implementation). We can further decompose Eq. (3) based on the independence between the input x and the human prior belief b. (proof and assumptions in Section C.2 of the Supplementary Information):

$$p({y}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }})p({\boldsymbol{\alpha }}| {\bf{x}},b)\propto \sum _{{\boldsymbol{\alpha }}}p(b| {\boldsymbol{\alpha }})\cdot p({y}| {\boldsymbol{\alpha }})\cdot p({\boldsymbol{\alpha }}| {\bf{x}}),\,\,s.t.,\,\,\Omega ({\boldsymbol{\alpha }})\le S$$

(4)

Then, we further decompose Eq. (4) using an antecedent α_x and its linear weight α_w:

$$\begin{array}{rcl} p(y|{\mathbf{x}},b)&\propto & \sum\limits_{{\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w} p(b | {\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w)\cdot p(y | {\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w)\cdot p({\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w | {\mathbf{x}}) \\ &=& \sum\limits_{{\boldsymbol{\alpha}}_x} p(b | {\boldsymbol{\alpha}}_x) \left(\sum\limits_{{\boldsymbol{\alpha}}_w} p(y | {\boldsymbol{\alpha}}_w) \cdot p({\boldsymbol{\alpha}}_w | {\boldsymbol{\alpha}}_x) \right) \cdot\ {p({\boldsymbol{\alpha}}_x | {\mathbf{x}})}, \\ &=& \sum\limits_{{\boldsymbol{\alpha}}_x} \underbrace{p(b | {\boldsymbol{\alpha}}_x)}_{\begin{array}{c}{\rm{Human}}\\ {\rm{prior}}\end{array}} \cdot \underbrace{p(y | {\boldsymbol{\alpha}}_x)}_{\begin{array}{c}{\rm{Consequent}}\\ {\rm{estimation}}\end{array}} \ \cdot\ \ {\underbrace{p({\boldsymbol{\alpha}}_x | {\mathbf{x}})}_{\begin{array}{c}{\rm{Deep}}\,{\rm{antecedent}}\\ {\rm{generation}}\end{array}}}, \quad s.t., \quad {{\Omega}}({\boldsymbol{\alpha}}_x) \leq S \end{array}$$

(5)

In Eq. (5), two additional constraints are introduced to ensure that the weight α_w functions as a faithful explanation. The first constraint, p(y∣α_w) = p(y∣α_x, α_w), is designed to prevent the model from bypassing the explanatory weight α_w and relying instead on latent representations of the antecedent α_x for predicting the consequent y. The second constraint, p(α_w∣α_x) = p(α_w∣α_x, x), ensures that the estimation of α_w is based solely on the selected antecedent α_x and not directly on the raw input x. This guards against information leakage that could undermine the interpretability of the explanation.

We assume p(b∣α_x) = p(b∣α_x, α_w), as b represents a human’s prior belief about the rules encoded in α_x, which should not depend on how the model weights them internally. We can observe that the only difference between Eq. (5) and Eq. (4) lies in the use of an antecedent α_x instead of full explanation α. This implies that the introduction of the weight α_w only affects the internal estimation process of the consequent, and without explicit guidance, this process may diverge significantly from human expectations.

The three derived terms correspond to three main modules of the proposed framework, SELOR. The first component, human prior p(b∣α_x), encodes human guidance on preferred rule forms, aiming to reduce the likelihood of misunderstanding, as discussed in Section “Human Prior p(b|α_x)”. The second, consequent estimation p(y∣α_x), models the relationship between the explanation α_x and the predicted output y through the use of the weight α_w. This weight is carefully estimated to ensure a meaningful and consistent relationship, so that each explanation naturally leads to the prediction according to human perception, as described in Section “Consequent Estimation p(y|α_x)”. Lastly, deep antecedent generation p(α∣x) leverages the deep representation of input x learned by the given deep model f to infer an appropriate explanation α, as elaborated in Section “Deep Antecedent Generation p(α|x)”.

The sparsity constraint Ω(α_x)≤S for the explanations can be enforced by sampling from p(α_x∣x). In particular, we rewrite Eq. (5) as an expectation and estimate it through sampling:

$$\begin{array}{lll}p({y}| {\bf{x}},b)\;\propto \;\sum\limits_{{{\boldsymbol{\alpha }}}_{x}}p(b| {{\boldsymbol{\alpha }}}_{x})\cdot p({y}| {{\boldsymbol{\alpha }}}_{x})\cdot p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})\\\qquad\qquad =\mathop{{\mathbb{E}}}\limits_{{{\boldsymbol{\alpha }}}_{x} \sim p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})}p(b| {{\boldsymbol{\alpha }}}_{x})\cdot p({y}| {{\boldsymbol{\alpha }}}_{x})\approx \frac{1}{S}\sum\limits_{\begin{array}{c}s\in [1,S]\\ {{\boldsymbol{\alpha }}}_{x}^{(s)} \sim p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})\end{array}}p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})\,p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})\end{array}$$

(6)

where ${{\boldsymbol{\alpha }}}_{x}^{(s)}$ is the sth sample of α_x. For example, to maximize the approximation term with S = 1, the antecedent generator p(α_x∣x) must find a single sample ${{\boldsymbol{\alpha }}}_{x}^{(s)}$ that yields the largest $p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})$, and it needs to assign a high probability to the best ${{\boldsymbol{\alpha }}}_{x}^{(s)}$. Otherwise, other samples with a lower $p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})$ may be generated, thereby decreasing p(y∣x, b). This ensures the sparsity of p(α_x∣x), which improves the model interpretability.

Human prior p(b∣α
_x)

Human prior p(b∣α_x) = p_h(b∣α_x)p_s(b∣α_x) consists of hard priors p_h(b∣α_x) and soft ones p_s(b∣α_x).

Hard priors categorize the feasible solution space for the rules: p_h(b∣α_x) = 0 if α_x is not a feasible solution. Humans can easily define hard priors of α_x by choosing the atom types, such as whether the interpretable features are words, phrases, or statistics like word frequency, and the antecedent’s maximum length L. SELIN does not require a predefined rule set. Nonetheless, we allow users to enter one if it is more desirable in some application scenarios. A large solution space increases the time cost for deep logic rule reasoning (Section “Optimization and Time Complexity”) but also decreases the probability of introducing undesirable bias.

Soft priors model different levels of human preference for logic rules. For example, people may prefer shorter rules or high-coverage rules that satisfy many input samples. The energy function can parameterize such soft priors: ${p}_{s}(b| {{\boldsymbol{\alpha }}}_{x})\propto \exp (-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}))$, where ${{\mathcal{L}}}_{b}$ is the loss function for punishing undesirable logic rules. We do not include any soft priors in our current implementation.

For example, suppose we are inducing logic rules to explain a sentiment classifier’s decision on restaurant reviews. The interpretable features α_x may include binary indicators for the presence of words like “awesome”, “tasty”, or “not” in the input text. A hard prior p_h(b∣α_x) may rule out any rule that includes more than L = 2 words in the antecedent (e.g., a rule using “awesome” and “not” is allowed, but not one using “awesome”, “not”, and “tasty” together), if the user has defined a maximum antecedent length of 2 as part of their hard prior. A soft prior p_s(b∣α_x) can reflect a user’s preferences over logic rules. For instance, if a user prefers commonly used words, a rule like “awesome ≥ 1” may be favored over “pulchritudinous ≥1”, even though both convey a positive meaning.

Consequent estimation p(y∣α
_x)

Consequent estimation models p(y∣α_x), the relationship between the antecedent α_x and the prediction y using the weight α_w. The weight α_w is computed to ensure a meaningful and consistent relationship, so that each explanation naturally leads to the prediction according to human perception. This is achieved by testing the logic rule α_x ⇒ y across the entire training dataset, ensuring that it represents the human knowledge embedded in the data distribution.

A straightforward way to compute p(y∣α_x) is an empirical estimation: first, collect all samples that satisfy antecedent α_x, and then calculate the percentage of them that have label y³⁰. For example, given explanation α_x = “awesome ≥2”, if we obtain all instances in which awesome appears more than twice and find that 90% of them have label y = positive sentiment, then p(y∣α_x) = 0.9. Large p(y∣α_x) corresponds to global patterns that naturally align with human perception. Mathematically, this is equivalent to approximating p(y∣α_x) with the empirical probability $\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})$:

$$\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})={n}_{{{\boldsymbol{\alpha }}}_{x},y}/{n}_{{{\boldsymbol{\alpha }}}_{x}}$$

(7)

where ${n}_{{{\boldsymbol{\alpha }}}_{x},y}$ is the number of training samples that satisfy the antecedent α_x and has the consequent y, and ${n}_{{{\boldsymbol{\alpha }}}_{x}}$ is the number of training samples that satisfy the antecedent α_x. Directly setting p(y∣α_x) to $\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})$ can cause three problems. First, when n_α is not large enough, the empirical probability $\hat{p}(y| {\boldsymbol{\alpha }})$ may be inaccurate, and the modeling of such uncertainty is inherently missing in this formulation. Second, statistically modeling the probability of y based solely on α_x, without details about the contributions of each atom in α_x, may lead users to feel there is still an unaddressed part in the explanation. Third, computing $\hat{p}(y| {\boldsymbol{\alpha }})$ for every antecedent α is intractable, since the number of feasible antecedents A increases exponentially with antecedent length L.

To address the aforementioned problems, we employ a neural estimation of categorical distribution, which jointly model $\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})$ and the uncertainty caused by low-coverage antecedents with the categorical distribution. For example, suppose antecedent α_x is “tasty ≥2”. If this antecedent is only satisfied by 3 training samples, among which 2 have the label y = negative sentiment, then the empirical estimate $\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})=2/3$. However, since ${n}_{{{\boldsymbol{\alpha }}}_{x}}=3$ is small, the model considers this estimate uncertain, and the resulting p(y∣α_x) is smoothed toward a more uniform distribution according to the learned β. In contrast, for a high-coverage antecedent like “great ≥1”, if 900 out of 1000 samples have positive sentiment, the empirical estimate 0.9 will be trusted more, and p(y∣α_x) will stay close to 0.9. See results in Section “Explainability Evaluation on Data Consistency” for the approximation capability of our model.

Assume that, given antecedent α_x, the class y follows a categorical distribution, where each category corresponds to a class. We define β as the concentration hyperparameter of this categorical distribution, which controls how uniformly the probability is distributed across the classes – higher values of β lead to more uniform distributions, while lower values concentrate the probability on fewer classes. Then, according to the posterior predictive distribution, y takes one of K potential classes, and we may compute the probability of a new observation y given existing observations:

$$p(y| {{\boldsymbol{\alpha }}}_{x})=p({y}| {{\mathcal{Y}}}_{{{\boldsymbol{\alpha }}}_{x}},\beta )\approx \frac{\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x}){n}_{{{\boldsymbol{\alpha }}}_{x}}+\beta }{{n}_{{{\boldsymbol{\alpha }}}_{x}}+K\beta }$$

(8)

Here, ${{\mathcal{Y}}}_{{{\boldsymbol{\alpha }}}_{x}}$ denotes ${n}_{{{\boldsymbol{\alpha }}}_{x}}$ observations of class label y obtained by checking the training data, and β is automatically trained. Eq. (8) becomes Eq. (7) when ${n}_{{{\boldsymbol{\alpha }}}_{x}}$ increases to ∞, and becomes a uniform distribution when ${n}_{{{\boldsymbol{\alpha }}}_{x}}$ goes to 0. Thus, a low-coverage antecedent with a small ${n}_{{{\boldsymbol{\alpha }}}_{x}}$ is considered uncertain (i.e., close to uniform distribution). By optimizing Eq. (8), our method automatically balances the empirical probability $\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})$ and the number of observations ${n}_{{{\boldsymbol{\alpha }}}_{x}}$. Probability p(y∣α_x) also serves as the confidence score for the logic rule α_x ⇒ y.

We then employ atom weight α_w to model $\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})$ based on the contribution of each atom in α_x. We adopt deep neural network as a consequent estimator that predicts α_w, which improves generalization to unseen cases and enhances noise handling. The details about the deep neural network is in Section C.3 of the Supplementary Information. Given the chosen antecedent α_x for the given input x, we define an arbitrary data sample in the dataset as x^j ∈ X. The candidates set of atoms for x^j is denoted by ${\mathcal{C}}({{\bf{x}}}^{j})$. Each atom candidate in ${\mathcal{C}}({{\bf{x}}}^{j})$ should satisfy both global and local constraints. The hard priors discussed in Section “Human Prior p(b|αx)” provide the global constraint, ensuring that the atom conforms to a human-defined logical form. The local constraint requires that x^j satisfies the atom. An atom “awesome > 1”, for example, is sampled only if x^j mentions “awesome” more than once. Next, u^j is the vector that indicates whether each atom o_i in α_x is also included in the candidate set ${\mathcal{C}}({{\bf{x}}}^{j})$, i.e., ${u}_{i}^{j}={\mathbb{I}}({o}_{i}\in {\mathcal{C}}({{\bf{x}}}^{j}))$ where ${u}_{i}^{j}$ is i-th element of u^j and ${\mathbb{I}}$ represents an indicator function. Additionally, we define the region ${{\mathcal{R}}}_{i}$ for the atom o_i as the set of train data samples that satisfies atom o_i, and we define ${\mathcal{R}}={{\mathcal{R}}}_{1}\cup \ldots \cup {{\mathcal{R}}}_{L}$ as the entire region of the antecedent α_x. The deep model then predicts ${{\boldsymbol{\alpha }}}_{w}=\left(\begin{array}{ccc}{w}_{11}&\ldots &{w}_{1L}\\ \ldots &\ldots &\ldots \\ {w}_{K1}&\ldots &{w}_{K\,L}\end{array}\right)$ from α_x, and minimizes the following loss objective.

$${{\mathcal{L}}}_{w}=\sum _{({{\bf{x}}}_{j},{y}_{j})\in {\mathcal{R}}}CrossEntropyLoss({{\boldsymbol{\alpha }}}_{w}^{T}{{\bf{u}}}^{j},{y}_{j})$$

(9)

This regression tests combinations of atoms in α_x on train dataset, allowing the deep model to learn α_w as the relationship between the prediction y and each atom, based on the human knowledge reflected in the labels. Since x naturally satisfies all atoms in α_x, we calculate the sum of α_w across the atoms to derive a logit for each class. We then apply a softmax function across classes to obtain $\tilde{p}(y| {{\boldsymbol{\alpha }}}_{x})$, where $\tilde{p}(y| {{\boldsymbol{\alpha }}}_{x})$ represents the predicted empirical probability.

$$\tilde{p}({y}| {{\boldsymbol{\alpha }}}_{x})=\frac{\exp \left({\sum }_{i\in [1,L]}{w}_{yi}\right)}{{\sum }_{k\in [1,K]}\exp \left({\sum }_{i\in [1,L]}{w}_{ki}\right)}$$

(10)

Subsequently, we use the multi-task learning framework in ref. ³¹ to train the neural network, ensuring that its predicted probability $\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})$ aligns with the empirical probability. This alignment is achieved by minimizing the loss specified in the following equation.

$${{\mathcal{L}}}_{r}=\frac{1}{2{\sigma }_{p}^{2}}| | \hat{p}({y}| {\boldsymbol{\alpha }})-\tilde{p}({y}| {\boldsymbol{\alpha }})| {| }^{2}+\frac{1}{2{\sigma }_{n}^{2}}| | {n}_{{\boldsymbol{\alpha }}}-{\tilde{n}}_{{\boldsymbol{\alpha }}}| {| }^{2}+\log {\sigma }_{p}{\sigma }_{n},$$

(11)

where ${\tilde{n}}_{{\boldsymbol{\alpha }}}$ is the predicted coverage given by the neural model, and σ_p and σ_n are standard deviations of ground truth probability and coverage, respectively. Finally, we adjust two loss objective ${{\mathcal{L}}}_{r}$ and ${{\mathcal{L}}}_{w}$ with a hyperparameter λ for the training of neural consequent estimator.

$${{\mathcal{L}}}_{c}={{\mathcal{L}}}_{r}+\lambda {{\mathcal{L}}}_{w}$$

(12)

Deep antecedent generation p(α∣x)

Deep antecedent generation finds an antecedent α_x for explanation by reshaping the given deep model f. Specifically, we replace the prediction layer in f of the backbone model with an explanation generator, so that the latent representation z of input x is mapped to an explanation, instead of directly mapping to a prediction (e.g., class label). We outline the generation process with a formal definition. First, we precompute the embedding of each atom by averaging the embeddings of all training instances that satisfy the atom. During both training and inference, the antecedent generator sequentially selects atoms to form an explanation. At each selection step, the input embedding z is combined with the embeddings of the previously selected atoms to form a latent representation h via an encoder. We compute a probability distribution over candidate atoms based on their similarity to h, and an atom is sampled from this distribution using the Gumbel-softmax trick, excluding already selected atoms. This process repeats until a predefined number of atoms is selected, forming the final antecedent.

Formally, given z, which is the representation of input x in the last hidden layer of f, we generate explanation α = (o₁…, o_L) with a recursive formulation. Note that this process has a complexity that is linear with L (Section “Optimization and Time Complexity”). Formally, given z and o₁, …o_i−1, we obtain atom o_i by

$${{\bf{h}}}_{i}=Encoder([{\bf{z}};{{\bf{o}}}_{1}\ldots ;{{\bf{o}}}_{i-1}]),\quad p({o}_{i}| {\bf{x}},{o}_{1}\ldots ,{o}_{i-1})=\frac{{\mathbb{I}}({o}_{i}\in {\mathcal{C}}({\bf{x}}))\exp ({{\bf{h}}}_{i}^{T}{{\bf{o}}}_{i})}{{\sum }_{\tilde{o}}{\mathbb{I}}(\tilde{o}\in {\mathcal{C}}({\bf{x}}))\exp ({{\bf{h}}}_{i}^{T}\tilde{{\bf{o}}})}$$

(13)

where o_i is the embedding of atom o_i and Encoder is a neural sequence encoder such as GRU³² or Transformer³³. ${\mathbb{I}}$ is the indicator function, and ${\mathcal{C}}({\bf{x}})$ is the set of atom candidates for x. Note that we set the probability of the atoms that do not satisfy global or local constraints to zero. This ensures that only the atoms satisfying the specified conditions will be chosen in the following sampling process. We then sample o_i from p(o_i∣x, o₁…, o_i−1) in a differentiable way to ensure end-to-end training:

$${o}_{i}=Gumbel(p(\tilde{o}\in {\mathcal{C}}({\bf{x}})\subset {\mathcal{O}}\,| \,{\bf{x}},{o}_{1}\ldots ,{o}_{i-1})),\,\,\,p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})=\prod _{i\in [1,L]}p({o}_{i}| {\bf{x}},{o}_{1}\ldots ,{o}_{i-1})$$

(14)

Gumbel is Straight-Through Gumbel-Softmax³⁴, a differentiable function for sampling discrete values. An atom o_i is represented as a one-hot vector with a dimension of $| {\mathcal{O}}|$, where ${\mathcal{O}}$ is the set of all atoms that satisfies hard priors. Then o_i is multiplied with the embedding matrix of atoms to derive the embedding o_i.

Optimization and time complexity

A deep logic rule reasoning model is learned in two steps. The first step optimizes the neural consequent estimator to learn p(y∣α_x) by minimizing loss ${{\mathcal{L}}}_{c}$ in Eq. (12). The second step converts the deep model f to an explainable version by maximizing p(y∣x, b) in Eq. (6) with a cross-entropy loss. This is equivalent to minimizing loss ${{\mathcal{L}}}_{d}=-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}^{(s)})-\log p({y}^{* }| {{\boldsymbol{\alpha }}}_{x}^{(s)})$. Here, $-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}^{(s)})$ punishes explanations that do not fit human’s prior preference for rules, while $\log p({y}^{* }| {{\boldsymbol{\alpha }}}_{x}^{(s)})$ finds an antecedent ${{\boldsymbol{\alpha }}}_{x}^{(s)}$ that leads to the ground-truth class y^* with high confidence. We repeat the first step and the second step for every iteration of batch. For stable optimization, the parameters of the antecedent generator are frozen during the first step, and the parameters of the consequent estimator are frozen during the second step.

We analyze the per-sample time complexity of modules in our method. The complexity is $O(L\cdot | {\mathcal{O}}| )$ for antecedent generation, $O({L}^{2}+L| {\mathcal{R}}| )$ for neural consequent estimator. Therefore, the total time complexity is $O(L| {\mathcal{O}}| +L| {\mathcal{R}}| )$ since $L < < | {\mathcal{O}}|$ and $L < < | {\mathcal{R}}|$.

Source link

Continue Reading

Trending

Tools & Platforms3 weeks ago

Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks

Business3 days ago

The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial

Ethics & Policy1 month ago

SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية

Events & Conferences3 months ago

Journey to 1000 models: Scaling Instagram’s recommendation system

Jobs & Careers2 months ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Funding & Business2 months ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Education2 months ago

VEX Robotics launches AI-powered classroom robotics system

Podcasts & Talks2 months ago

Happy 4th of July! 🎆 Made with Veo 3 in Gemini

Mergers & Acquisitions2 months ago

Donald Trump suggests US government review subsidies to Elon Musk’s companies

Podcasts & Talks2 months ago

OpenAI 🤝 @teamganassi

aistoriz.com

AI is pervasive for asset managers. Can it power ETF gains? – Pensions & Investments

AI Insights

AI is pervasive for asset managers. Can it power ETF gains? – Pensions & Investments

Leave a Reply
Cancel reply

Leave a Reply

AI Insights

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI w..

AI Insights

Can AI bring more good than harm to the future of our jobs? Here’s what the data says

Forced to seek change

How will AI affect jobs?

An opportunity to create the unthinkable

Transition still in early days

How can we better prepare for an AI-driven world?

AI Insights

Toward faithful and human-aligned self-explanation of deep models

Formulation of logic rule explanations

Framework for deep logic rule reasoning

Human prior p(b∣α
_x)

Consequent estimation p(y∣α
_x)

Deep antecedent generation p(α∣x)

Optimization and time complexity

Trending

aistoriz.com

AI is pervasive for asset managers. Can it power ETF gains? – Pensions & Investments

You may like

Leave a Reply Cancel reply

Leave a Reply

AI Insights

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI w..

AI Insights

Can AI bring more good than harm to the future of our jobs? Here’s what the data says

Forced to seek change

How will AI affect jobs?

An opportunity to create the unthinkable

Transition still in early days

How can we better prepare for an AI-driven world?

AI Insights

Toward faithful and human-aligned self-explanation of deep models

Formulation of logic rule explanations

Framework for deep logic rule reasoning

Human prior p(b∣α x)

Consequent estimation p(y∣α x)

Deep antecedent generation p(α∣x)

Optimization and time complexity

Trending

Leave a Reply
Cancel reply

Human prior p(b∣α
_x)

Consequent estimation p(y∣α
_x)