Connect with us

AI Insights

Martyr “Majid Tajan -Jari”: The Man Who Reached the Heart of the World’s Artificial Intelligence

Published

on


TEHRAN- Martyr Majid Tajan Jar—a scientific genius who journeyed from the courtyard of his home in the village of Tajan Jar in Mazandaran Province to the heart of the world’s AI, now immortalized beside the word martyr.

Dr. Majid Tajan-Jari was a child who didn’t just take apart a broken radio but pieced its scattered fragments together like a puzzle, crafting a future with his small hands—a future that still echoes in the quiet of his childhood home.

It was as if an inner voice whispered to him: “The future begins right here.” This is the story told by a mother who witnessed every moment of it… and now narrates the silence of a home that her son, with his brilliance and his blood, gave meaning to.

A brilliance that seemed to have come from the future…

Some people are born not just for their own time, but for the times to come. From childhood, Dr. Majid Tajan-Jari showed signs of this timelessness in his demeanor—a sharp, creative mind that quickly blurred the line between play and science.

zobeideh Khaleghi, the martyr’s mother, recalls: “I remember one day when we went to the store together. Video players had just arrived. Majid was about ten or eleven. He took an old radio from his aunt, dismantled it, understood its components, and rebuilt it from scratch. We just watched, but it was as if he had a blueprint in his mind.”

Their simple courtyard became his laboratory—where he worked with electrical circuits and soldering. “One day, he asked me, ‘Mom, I don’t have a workshop—can I work here?’ I told him, ‘This house is yours. Do whatever you want.’”

Majid’s father, a retired employee, spoke of their financial struggles: “We had little, but Majid never gave up. He taught himself, built, and created.” At eighteen, he built a robot that didn’t just move—it thought.

Zobeideh continues: “We didn’t understand what he was making, but we knew it was something from the future.” Her voice is quiet, choked with emotion: “The pain of losing a child who was building the future is unbearable. The house feels smaller without him, and its silence is louder than ever.”

Yet Majid was not only unmatched in scientific brilliance—his ethics transcended ordinary boundaries. “He was kind to everyone; his respect and politeness were legendary,” his mother says. “Sometimes I thought his ‘grade’ in ethics was infinite.”

Majid’s move to Tehran was quiet and unassuming. “For fourteen years, he worked in silence,” his mother recalls. “I didn’t fully grasp what he was doing, but I felt he was fighting for something greater than himself.”

The scent of his shirt still lingers in the house…

Her voice trembles—not from breaking, but from standing firm, from honoring that pain. Softly, she says: “When I saw his body, it was as if the world stopped. I just looked at him… with that same smile he always had in my memory. I told myself, ‘Be calm—he wasn’t meant to stay. They didn’t bury him in the earth; they took him to the sky.’”

“He always said, ‘Kiss my throat, Mom…’” A brief silence follows. The mother looks down, then speaks a heavy truth: “Every time I visited his home, he’d say, ‘Mom, kiss my throat…’ Now I understand. I’m ashamed that the last time, I couldn’t kiss his throat.”

Our hearts are broken, but we have not collapsed

Amid this crushing grief, a voice rises from the depths of faith—not of mourning, but of resilience: “My sister calls every day and asks, ‘Zobeideh, I’m just his aunt, and I’m burning with grief—how are you still breathing?’ And I tell her, ‘Patience is the only thing Majid planted in my heart. He left, but he left his patience behind for me.’”

“His memory has lit up our lives.”

Martyr "Majid Tajan -Jari": The Man Who Reached the Heart of the World’s Artificial Intelligence

“We mothers live with our skin and bones—we touch pain. But every night, I tell myself, ‘Majid, my soul, though they took your body from me, your name, your memory, your voice are still with me. Sometimes, I still hear the door… as if you’re coming home, turning the key, saying, ‘Mom, I hope you’re not tired.’”

Ali Tajan-Jari, the martyr’s father, a quiet man with a gaze heavy with years of experience, sits on the couch, flipping through old photographs.

In a simple home, he had a global mind

His father, with a faint smile, glances toward the courtyard. A quiet pride lingers in his eyes: “That simple home, that humble courtyard, became the birthplace of boundless dreams.”

“From that small room, he connected with the world. He said, ‘I will stay in Iran, but my scientific voice must be heard beyond borders.’ And so it was. I often heard that when asked where his students were, he’d smile and say, ‘Everywhere… Spain, England, Canada, Turkey…’”

He built bridges from failure

A brief silence lingers between the father’s words before he continues: “In one of our talks, he said, ‘I’ve failed many, many times… but I built a home—a scientific family. All my chances were there.’ That group was called ‘AIO Learn’—young people who rose from the ground and reached the summit.”

The father places a hand on his chest, as if something deep within him speaks: “We didn’t know Majid was teaching. Not out of secrecy, but because, amid building robots and AI projects, that side of him was less visible.”

“One day, we heard his students had surpassed 500,000. Majid was a teacher without borders—with a virtual blackboard, yet magnificent. And all of it began in a room that didn’t even have an extra chair. Just love, a laptop, and a light of passion.”

“He always said, ‘Science must have attraction—not fear, not force… only motivation and the desire to know.’”

A Quran that still carries his presence…

Moments later, the father grows quieter. His eyes settle on a small Quran on the table—the one that had accompanied his son for years. Slowly, he takes out his glasses, places them on, and silently recites a verse.

His voice is soft, but the words are clear and firm. He closes the Quran, running his hand over its cover—as if still feeling the warmth of his son’s hands.

In the silence of the house, only the sound of his breathing can be heard. His gaze lingers on his son’s portrait. He says nothing. But that look tells a thousand unspoken words.

The end of a story, the beginning of a path

This chapter of Majid’s life was not just a career—it was part of Iran’s scientific identity today. A young man who chose to stay instead of emigrate, to build instead of complain, and to take root instead of leave.

In a simple home, with hands on a keyboard and a heart full of conviction, he trained students who now carry his legacy across the world.

The legacy he planted in life…

Mohaddeseh Tajan-Jari, the martyr’s sister, sits composed in the frame of the image. Soft light from a half-open window falls on her face. Her voice, delicate and measured, wavers between sorrow and pride:

“Sometimes they ask, ‘What did Majid leave behind?’ He had no children, no family of his own… But I say, ‘If only they knew what a child truly is.’”

“Majid did not father a child of his blood, but he fathered one of his mind—he named it his company. He always said with certainty, ‘I built AIO Learn… this is my child.’”

Martyr "Majid Tajan -Jari": The Man Who Reached the Heart of the World’s Artificial Intelligence

She pauses briefly, then adds: “Majid wasn’t just my brother—he was my confidant. We never fought—not because we couldn’t, but because there was no need. We were friends, united in thought, concern, and heart. More than a brother, he was my teacher—one whose silence itself was a lesson.”

“When my child was born, he was genuinely happy. He’d buy toys and say, ‘He must grow up intelligent.’ He wasn’t a father, but he lived fatherhood. In action, he was a martyr—not just in title.”

Her voice grows quieter, but the meaning grows heavier: “He didn’t see martyrdom only in combat. He stayed up till dawn coding, creating ideas, building the future. He wrote projects that seemed to come from decades ahead.”

“His jihad was a jihad of thought—his battlefield was science, his weapon genius. Martyrdom was not the end of his path—it was the manifestation of a life entirely devoted.”

My brother said ‘no’ to money, ‘yes’ to his homeland

The narrative shifts—from emotion to loyalty, from offers to faith. “When a major European company made him a staggering offer, everyone thought his choice was obvious. High salary, easy immigration… I told him, ‘Majid, it’s your decision.’ He smiled and said, ‘Mahdeh, I can’t live in a country where they lie about my people day and night. Even if I have to live in a tent, I’d rather be in my homeland.’”

An ascension that was preordained

Her gaze drifts to a distant point—a moment of silence. Then, with inner conviction, she says: “Majid wasn’t born—it was as if he descended. He came to build, to teach, to inspire… and when his mission was complete, he left. Not in silence, but at his peak.”

“I always think God entrusted Majid to us for only thirty-five years. Now, his mission is over… but his voice still flows.”

We are still standing…

Today, the small room in the Tajan-Jar home is silent. The sound of soldering is gone, the monitor remains dark, the desk empty. But the ideas born in that room are more alive than ever—in the pulse of research, the veins of science, the sky of hope.

Martyr Dr. Majid Tajan-Jari is no longer among us, but his vision still shines in the eyes of his students. His thoughts live on in the code he wrote, the projects he brought to life, the dreams he refused to leave unfinished.

Martyr "Majid Tajan -Jari": The Man Who Reached the Heart of the World’s Artificial Intelligence

He is gone, but his path remains. His principles—his belief in staying, in building, in nurturing elites on his homeland’s soil—endure.

A father, with eyes full of pride, spoke of a son who, in silence, in dignity, in action, wrote a new definition of scientific jihad.

And today, we are certain: some people do not come to stay—they come to light a lamp that will illuminate the path for years to come…

Martyr Dr. Majid Tajan-Jari was not just a scientific genius—he was the embodiment of committed, scholarly, and national life. A man who could have crossed borders, shone in the world’s best institutions, but chose to remain in this soil, take root, and build a bright future.

(Source: Mehr News Agency)



Source link

AI Insights

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI w..

Published

on


Builder AI launches liquidation process in Delaware after controversy over sales overestimation, Nate founder’s federal indictment, GameOn false data, etc

[Picture = Gemini]

While the generative artificial intelligence (AI) craze is approaching its peak, promises that “AI will do everything on its own” are collapsing throughout Silicon Valley. The bankruptcy of Builder AI, which was revered as a unicorn, is a symbolic event.

According to the New York Times on the 31st (local time), Builder AI has launched a massive promotion with high growth in 2024, but a board investigation has confirmed overstatement of sales. After management changes and a liquidity crisis, the company entered liquidation proceedings in Delaware courts in the first half of 2025. As suspicions spread that “people took care of it from behind” over the reality of AI manager Natasha, who said he would automatically make the app, management explained, “AI was an auxiliary tool and did not replace people,” but failed to restore trust.

The incident shows how easily verification of the actual level of automation of technology and financial numbers can be pushed back while the label ‘AI’ draws the attention of investment and media.

Similar scenes were repeated on other stages. The shopping app “Nate” promoted that “Deep Learning replaces payment and checkout,” but allegations arose that the Philippine outsourcing staff handled the order manually. Eventually, the Southern New York Federal Prosecutor’s Office (SDNY) charged its founder with investor fraud in the spring of 2025.

San Francisco startup “Game On” put forward an AI sports chatbot, but was indicted on false financial data, fake audit reports, and allegations of inflated sales. What these events have in common is that they promoted “AI-washing,” that is, processes that are largely performed by humans or low automation maturity, as if they were “completely automatic.”

‘AI done by humans’ is not small in the field of large corporations. Amazon’s “Just Walk Out” was a concept that sensors and computer vision handled automatic payments, but reports continued that personnel identified and inspected transactions in actual operations. Amazon denied the controversy over the exaggeration, but adjusted its store strategy to focus on smart carts.

Presto Automation, which introduced a fast-food drive-through automatic response solution, was also found to have processed a significant percentage of orders at a certain time. Legal technology start-up advocated automating personal injury case documents, but when internal testimony was reported that many of the actual tasks depend on human inspection, the company emphasized that “the combination of AI and humans is essential for high quality.”

“The fall of Builder AI clearly shows what to believe and what to doubt in the current AI boom,” the New York Times said. “As it is said that AI is sold, but automation is not, the gap between the actual level of technology and market expectations is still large.”



Source link

Continue Reading

AI Insights

Can AI bring more good than harm to the future of our jobs? Here’s what the data says

Published

on


No generation is spared from the cultural upheaval of new technology. In the 2020s, it is AI fuelling that disruption.

Love it or fear it, artificial intelligence offers endless possibilities some people would have now felt on a personal level.

While many are fearing for their jobs, however, others are seeing a widening of possibilities.

Max Hamilton is worried about the impact AI would have on creatives, including copyright issues. (Supplied: Max Hamilton)

Forced to seek change

Max Hamilton, a graphic designer with over two decades of experience, has already adapted her career to meet the threat of generative AI head on.

The increasingly scarce availability of jobs pushed her to venture into illustration work for children’s books.

“I saw that happening a few years ago and that’s when I pivoted,” Ms Hamilton said.

I’ve been really focusing on using watercolour and hand drawing, which I did on purpose because I thought that might set me apart from having the computer-generated look.

To stay ahead of the fast-changing landscape, she has also expanded her skill set to include writing, meaning she can be involved in every aspect of producing a book.

“As a creative, I think we like to think that our creativity is our special weapon,” Ms Hamilton said.

How will AI affect jobs?

Data shows that creatives like Ms Hamilton are right to expect increasing AI influence in their sectors.

A recent Jobs and Skills Australia (JSA) report confirms artificial intelligence will bring about an impending change to the labour market, either through automation or augmentation.

The body assessed various tasks within ANZSCO-listed occupations and ranked them based on the degree to which AI could impact them.

Here are the sectors JSA predicts are most likely to be automated by artificial intelligence, with existing workflows replaced.

Here are the sectors most likely to be augmented by artificial intelligence, improving the output of existing workers.

Evan Shellshear, an innovation and technology expert from The University of Queensland, explains what this means for the availability of jobs in the market.

“It’s not jobs that are at risk of AI, it’s actual tasks and skills,” Dr Shellshear said.

We’re seeing certain skills and parts of jobs disappearing, but not necessarily whole occupations disappearing.

The report further supports this, saying Generative AI technologies are more likely to help boost workers’ productivity, as opposed to replacing them, especially in high-skilled occupations.

In fact, Dr Shellshear believes there’s a likelihood AI will create job opportunities.

“It’s making a lot of things that were impossible, possible,” he says, especially for small businesses.

“Gen AI can lower the cost for things, expertise and knowledge that were out of reach in the past.”

An opportunity to create the unthinkable

Growing up a big fan of science fiction, Melanie Fisher jumped at the chance to experiment with Generative AI shortly after ChatGPT was released.

Ms Fisher started off by testing the tool’s knowledge of food regulation, with which she was familiar from years of experience in the industry.

“It came up with some untrue stuff so early on I learnt you have to be careful,” said the 67-year old who is based in Canberra.

Woman with short hair and glasses holding open a laptop with codes on screens

Melanie Fisher built an app for her 3 year-old grandchild and now it’s a bonding activity for the two (Supplied: Melanie Fisher)

But Ms Fisher continued pushing the bounds of what AI could offer — using it to find new recipes and suggestions for things to do — before landing on the idea to create a game app for her 3-year old granddaughter Lilly*.

When she heard AI could code, she thought to herself, “Oh I’d love to try that, but … I’m not an IT person or anything.”

So, she threw the question to ChatGPT.

A screenshot of Melanie's prompt to ChatGPT asking for help to build an app with no prior experience.

Melanie spelled out her request on the generative AI tool and made it clear she had no relevant background. (Supplied: Melanie Fisher)

The tool recommended a program that allowed her to drag and drop different elements to produce a coherent story mode gameplay.

Ms Fisher didn’t have to look far for inspiration.

“[The game was] based on stories Lilly* and I made up about her being a girl pirate with her friends, and they have adventures together,” Ms Fisher said.

It took three weeks of work to bring her vision to reality, even getting the characters to loosely resemble Lilly*.

Now the game has become a special pet project for the duo.

A collage of two screenshots of the game play

Melanie continues to build on the gameplay with input from her granddaughter.   (Supplied: Melanie Fisher)

Drawing from her own experience, Ms Fisher sees AI as a double-edged sword.

“I think it’s a great leap forward for people, but I do very much worry it’s going to massively displace lots of people from work,” she said.

Transition still in early days

Many professionals such as recruiters, university staff and health practitioners have incorporated AI into their workflows.

More recently, Commonwealth Bank Australia made headlines by slashing jobs due to artificial intelligence, only to later put out an apology while backtracking its decision.

But news stories about corporate lay-offs and downsizing don’t necessarily point to an AI takeover, according to Professor Nicholas Davis. He is a former World Economic Forum executive, who is now an artificial intelligence expert at the University of Technology Sydney (UTS).

He believes these trends are being driven by “early adopters” and foresees “a disconnect between expectation versus reality”.

“We’re likely to see organisations lay off people in anticipation of gains and then rehiring because it doesn’t quite work the way they expect,” said Professor Davis.

“We’re at very early stages of using the latest forms of AI at the enterprise level.

“Most organisations have yet to see a measurable positive impact on the bottom line.”

An example he provided is how the introduction of self check-out machines at supermarkets resulted in higher levels of staff stress, customer frustration and costs from theft.

This has led a number of UK and US chains to reintroduce manned tills.

“The consumer experience is different to the organisational value and experience,” warns Professor Davis.

A side angle of a man speaking

Nicholas Davis believes humans are still needed alongside AI for it to perform sustainably and reliably. (ABC News: Ian Cutmore)

How can we better prepare for an AI-driven world?

Despite having success with the app, Ms Fisher says, “I’ve learnt a little bit but I don’t think I could become a game developer.”

Speaking to this, Dr Shellshear agrees there’s a distinction to be made between what is possible with AI and the value humans have to offer.

A profile image of Dr Evan Shellshear in a suit

Dr Evan Shellshear believes people should focus on harnessing the right skills for an AI-driven future. (Supplied: Dr Evan Shellshear)

While AI can help a person attain new skills, they still need education, training and real-world expertise to get to a professional level, he adds.

Having conducted his own research into AI’s impact on jobs with a keen interest in what remains relevant in the future, he found professions involving communication, management, collaboration and creativity, assisting and caring to be most difficult to replicate.

Other human traits such as problem-solving, resilience, attentiveness are also irreplaceable, says Professor Davis.

But he says that having a varied set of skills can put you at an advantage.

“The more you’re able to add value, the less it matters that things get taken away,” he said.

“But if your job is doing one specific thing or creating one style, then there’s where it gets problematic.”

“Embracing, engaging and reinventing is how you benefit.”

Here’s also Dr Shellshear’s advice on staying ahead of the game.

Recognise its impact on your life as an individual, especially from a job perspective and ask yourself: ‘How do I position myself to continue to add value with these tools around me?'”At some point, you have to learn how to integrate [AI] into your workflows otherwise you [risk no longer being] efficient or relevant.

*Name changed for privacy



Source link

Continue Reading

AI Insights

Toward faithful and human-aligned self-explanation of deep models

Published

on


Formulation of logic rule explanations

For given input x from the dataset X, our explanation α = (αx, αw) comprises two components: an antecedent αx and a linear weight αw. The antecedent αx and a consequent y together form a logic rule αx y, as illustrated in Fig. 1a. The linear weight αw indicates the contribution of the atoms in the antecedent αx within the logic rule. Meanings of symbols used in this paper are defined in Section C.1 of the Supplementary Information.

An antecedent αx represents the condition under which a rule applies and corresponds to an explanation expressed in logical form. It is defined as a sequence αx = (o1,…, oL), where each oi is an atom, and L is the length of the sequence. An atom is the smallest unit of explanation and corresponds to a single interpretable feature of a given input—for instance, a condition such as “awesome ≥2”. These interpretable features may differ from the features used by deep learning models. They can have different granularities (e.g., words or phrases vs. tokens), be based on statistical properties (e.g., word frequency), or be derived using external tools (e.g., grammatical tagging of a word). Mathematically, each atom oi is a Boolean-valued function that returns true if the ith interpretable feature is present in the input x, and false otherwise. Additional details about the atom selection process are provided in Section C.3 of the Supplementary Information. An input sample x is said to satisfy an antecedent αx if the logical condition αx(x) evaluates to true.

The consequent y denotes the model’s predicted output, given that the antecedent is satisfied. In a classification task, y typically corresponds to the predicted class label; in regression, y would be a real-valued number.

Finally, the linear weight αw models the contribution of each atom in the logical relationship αx y. It is represented as a matrix \({{\boldsymbol{\alpha }}}_{w}=\left(\begin{array}{ccc}{w}_{11}&\ldots &{w}_{1L}\\ \ldots &\ldots &\ldots \\ {w}_{K1}&\ldots &{w}_{KL}\end{array}\right)\), where K is the number of possible classes and wki indicates the contribution of atom oi to the prediction of class k. The magnitude of each weight reflects the strength of its corresponding atom’s contribution to the output prediction.

Framework for deep logic rule reasoning

Let us denote f as a deep learning model that estimates probability p(yx), where x is the input data sample and y is a candidate class. We upgrade model f to a self-explaining version by adding a logic rule explanation α. Then, we can reformulate p(yx) as

$$p({y}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }},{\bf{x}},b)p({\boldsymbol{\alpha }}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }})p({\boldsymbol{\alpha }}| {\bf{x}},b),\quad s.t.,\quad \Omega ({\boldsymbol{\alpha }})\le S$$

(3)

Here, b represents a human’s prior belief about the rules, e.g., the desirable form of atoms, Ω(α) is the required number of logic rules to explain given input x, and S is the number of samples (logic rules chosen by the model). Eq. (3) includes two constraints essential for ensuring explainability. The first constraint p(yα, x, b) = p(yα) requires that explanation α contains all information in the input x and b that is useful to predict y. Without the constraint, the model may “cheat” by predicting y directly from the input instead of using the explanation, which leads to a decrease of faithfulness. The second constraint Ω(α) ≤ S requires that the model can be well explained by using only S explanations, where S is small enough to ensure readability (S = 1 in our implementation). We can further decompose Eq. (3) based on the independence between the input x and the human prior belief b. (proof and assumptions in Section C.2 of the Supplementary Information):

$$p({y}| {\bf{x}},b)=\sum _{{\boldsymbol{\alpha }}}p({y}| {\boldsymbol{\alpha }})p({\boldsymbol{\alpha }}| {\bf{x}},b)\propto \sum _{{\boldsymbol{\alpha }}}p(b| {\boldsymbol{\alpha }})\cdot p({y}| {\boldsymbol{\alpha }})\cdot p({\boldsymbol{\alpha }}| {\bf{x}}),\,\,s.t.,\,\,\Omega ({\boldsymbol{\alpha }})\le S$$

(4)

Then, we further decompose Eq. (4) using an antecedent αx and its linear weight αw:

$$\begin{array}{rcl} p(y|{\mathbf{x}},b)&\propto & \sum\limits_{{\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w} p(b | {\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w)\cdot p(y | {\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w)\cdot p({\boldsymbol{\alpha}}_x, {\boldsymbol{\alpha}}_w | {\mathbf{x}}) \\ &=& \sum\limits_{{\boldsymbol{\alpha}}_x} p(b | {\boldsymbol{\alpha}}_x) \left(\sum\limits_{{\boldsymbol{\alpha}}_w} p(y | {\boldsymbol{\alpha}}_w) \cdot p({\boldsymbol{\alpha}}_w | {\boldsymbol{\alpha}}_x) \right) \cdot\ {p({\boldsymbol{\alpha}}_x | {\mathbf{x}})}, \\ &=& \sum\limits_{{\boldsymbol{\alpha}}_x} \underbrace{p(b | {\boldsymbol{\alpha}}_x)}_{\begin{array}{c}{\rm{Human}}\\ {\rm{prior}}\end{array}} \cdot \underbrace{p(y | {\boldsymbol{\alpha}}_x)}_{\begin{array}{c}{\rm{Consequent}}\\ {\rm{estimation}}\end{array}} \ \cdot\ \ {\underbrace{p({\boldsymbol{\alpha}}_x | {\mathbf{x}})}_{\begin{array}{c}{\rm{Deep}}\,{\rm{antecedent}}\\ {\rm{generation}}\end{array}}}, \quad s.t., \quad {{\Omega}}({\boldsymbol{\alpha}}_x) \leq S \end{array}$$

(5)

In Eq. (5), two additional constraints are introduced to ensure that the weight αw functions as a faithful explanation. The first constraint, p(yαw) = p(yαx, αw), is designed to prevent the model from bypassing the explanatory weight αw and relying instead on latent representations of the antecedent αx for predicting the consequent y. The second constraint, p(αwαx) = p(αwαx, x), ensures that the estimation of αw is based solely on the selected antecedent αx and not directly on the raw input x. This guards against information leakage that could undermine the interpretability of the explanation.

We assume p(bαx) = p(bαx, αw), as b represents a human’s prior belief about the rules encoded in αx, which should not depend on how the model weights them internally. We can observe that the only difference between Eq. (5) and Eq. (4) lies in the use of an antecedent αx instead of full explanation α. This implies that the introduction of the weight αw only affects the internal estimation process of the consequent, and without explicit guidance, this process may diverge significantly from human expectations.

The three derived terms correspond to three main modules of the proposed framework, SELOR. The first component, human prior p(bαx), encodes human guidance on preferred rule forms, aiming to reduce the likelihood of misunderstanding, as discussed in Section “Human Prior p(b|αx)”. The second, consequent estimation p(yαx), models the relationship between the explanation αx and the predicted output y through the use of the weight αw. This weight is carefully estimated to ensure a meaningful and consistent relationship, so that each explanation naturally leads to the prediction according to human perception, as described in Section “Consequent Estimation p(y|αx)”. Lastly, deep antecedent generation p(αx) leverages the deep representation of input x learned by the given deep model f to infer an appropriate explanation α, as elaborated in Section “Deep Antecedent Generation p(α|x)”.

The sparsity constraint Ω(αx)≤S for the explanations can be enforced by sampling from p(αxx). In particular, we rewrite Eq. (5) as an expectation and estimate it through sampling:

$$\begin{array}{lll}p({y}| {\bf{x}},b)\;\propto \;\sum\limits_{{{\boldsymbol{\alpha }}}_{x}}p(b| {{\boldsymbol{\alpha }}}_{x})\cdot p({y}| {{\boldsymbol{\alpha }}}_{x})\cdot p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})\\\qquad\qquad =\mathop{{\mathbb{E}}}\limits_{{{\boldsymbol{\alpha }}}_{x} \sim p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})}p(b| {{\boldsymbol{\alpha }}}_{x})\cdot p({y}| {{\boldsymbol{\alpha }}}_{x})\approx \frac{1}{S}\sum\limits_{\begin{array}{c}s\in [1,S]\\ {{\boldsymbol{\alpha }}}_{x}^{(s)} \sim p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})\end{array}}p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})\,p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})\end{array}$$

(6)

where \({{\boldsymbol{\alpha }}}_{x}^{(s)}\) is the sth sample of αx. For example, to maximize the approximation term with S = 1, the antecedent generator p(αxx) must find a single sample \({{\boldsymbol{\alpha }}}_{x}^{(s)}\) that yields the largest \(p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})\), and it needs to assign a high probability to the best \({{\boldsymbol{\alpha }}}_{x}^{(s)}\). Otherwise, other samples with a lower \(p(b| {{\boldsymbol{\alpha }}}_{x}^{(s)})p({y}| {{\boldsymbol{\alpha }}}_{x}^{(s)})\) may be generated, thereby decreasing p(yx, b). This ensures the sparsity of p(αxx), which improves the model interpretability.

Human prior p(bα
x)

Human prior p(bαx) = ph(bαx)ps(bαx) consists of hard priors ph(bαx) and soft ones ps(bαx).

Hard priors categorize the feasible solution space for the rules: ph(bαx) = 0 if αx is not a feasible solution. Humans can easily define hard priors of αx by choosing the atom types, such as whether the interpretable features are words, phrases, or statistics like word frequency, and the antecedent’s maximum length L. SELIN does not require a predefined rule set. Nonetheless, we allow users to enter one if it is more desirable in some application scenarios. A large solution space increases the time cost for deep logic rule reasoning (Section “Optimization and Time Complexity”) but also decreases the probability of introducing undesirable bias.

Soft priors model different levels of human preference for logic rules. For example, people may prefer shorter rules or high-coverage rules that satisfy many input samples. The energy function can parameterize such soft priors: \({p}_{s}(b| {{\boldsymbol{\alpha }}}_{x})\propto \exp (-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}))\), where \({{\mathcal{L}}}_{b}\) is the loss function for punishing undesirable logic rules. We do not include any soft priors in our current implementation.

For example, suppose we are inducing logic rules to explain a sentiment classifier’s decision on restaurant reviews. The interpretable features αx may include binary indicators for the presence of words like “awesome”, “tasty”, or “not” in the input text. A hard prior ph(bαx) may rule out any rule that includes more than L = 2 words in the antecedent (e.g., a rule using “awesome” and “not” is allowed, but not one using “awesome”, “not”, and “tasty” together), if the user has defined a maximum antecedent length of 2 as part of their hard prior. A soft prior ps(bαx) can reflect a user’s preferences over logic rules. For instance, if a user prefers commonly used words, a rule like “awesome ≥ 1” may be favored over “pulchritudinous ≥1”, even though both convey a positive meaning.

Consequent estimation p(yα
x)

Consequent estimation models p(yαx), the relationship between the antecedent αx and the prediction y using the weight αw. The weight αw is computed to ensure a meaningful and consistent relationship, so that each explanation naturally leads to the prediction according to human perception. This is achieved by testing the logic rule αx y across the entire training dataset, ensuring that it represents the human knowledge embedded in the data distribution.

A straightforward way to compute p(yαx) is an empirical estimation: first, collect all samples that satisfy antecedent αx, and then calculate the percentage of them that have label y30. For example, given explanation αx = “awesome ≥2”, if we obtain all instances in which awesome appears more than twice and find that 90% of them have label y = positive sentiment, then p(yαx) = 0.9. Large p(yαx) corresponds to global patterns that naturally align with human perception. Mathematically, this is equivalent to approximating p(yαx) with the empirical probability \(\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})\):

$$\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})={n}_{{{\boldsymbol{\alpha }}}_{x},y}/{n}_{{{\boldsymbol{\alpha }}}_{x}}$$

(7)

where \({n}_{{{\boldsymbol{\alpha }}}_{x},y}\) is the number of training samples that satisfy the antecedent αx and has the consequent y, and \({n}_{{{\boldsymbol{\alpha }}}_{x}}\) is the number of training samples that satisfy the antecedent αx. Directly setting p(yαx) to \(\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})\) can cause three problems. First, when nα is not large enough, the empirical probability \(\hat{p}(y| {\boldsymbol{\alpha }})\) may be inaccurate, and the modeling of such uncertainty is inherently missing in this formulation. Second, statistically modeling the probability of y based solely on αx, without details about the contributions of each atom in αx, may lead users to feel there is still an unaddressed part in the explanation. Third, computing \(\hat{p}(y| {\boldsymbol{\alpha }})\) for every antecedent α is intractable, since the number of feasible antecedents A increases exponentially with antecedent length L.

To address the aforementioned problems, we employ a neural estimation of categorical distribution, which jointly model \(\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})\) and the uncertainty caused by low-coverage antecedents with the categorical distribution. For example, suppose antecedent αx is “tasty ≥2”. If this antecedent is only satisfied by 3 training samples, among which 2 have the label y = negative sentiment, then the empirical estimate \(\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})=2/3\). However, since \({n}_{{{\boldsymbol{\alpha }}}_{x}}=3\) is small, the model considers this estimate uncertain, and the resulting p(yαx) is smoothed toward a more uniform distribution according to the learned β. In contrast, for a high-coverage antecedent like “great ≥1”, if 900 out of 1000 samples have positive sentiment, the empirical estimate 0.9 will be trusted more, and p(yαx) will stay close to 0.9. See results in Section “Explainability Evaluation on Data Consistency” for the approximation capability of our model.

Assume that, given antecedent αx, the class y follows a categorical distribution, where each category corresponds to a class. We define β as the concentration hyperparameter of this categorical distribution, which controls how uniformly the probability is distributed across the classes – higher values of β lead to more uniform distributions, while lower values concentrate the probability on fewer classes. Then, according to the posterior predictive distribution, y takes one of K potential classes, and we may compute the probability of a new observation y given existing observations:

$$p(y| {{\boldsymbol{\alpha }}}_{x})=p({y}| {{\mathcal{Y}}}_{{{\boldsymbol{\alpha }}}_{x}},\beta )\approx \frac{\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x}){n}_{{{\boldsymbol{\alpha }}}_{x}}+\beta }{{n}_{{{\boldsymbol{\alpha }}}_{x}}+K\beta }$$

(8)

Here, \({{\mathcal{Y}}}_{{{\boldsymbol{\alpha }}}_{x}}\) denotes \({n}_{{{\boldsymbol{\alpha }}}_{x}}\) observations of class label y obtained by checking the training data, and β is automatically trained. Eq. (8) becomes Eq. (7) when \({n}_{{{\boldsymbol{\alpha }}}_{x}}\) increases to , and becomes a uniform distribution when \({n}_{{{\boldsymbol{\alpha }}}_{x}}\) goes to 0. Thus, a low-coverage antecedent with a small \({n}_{{{\boldsymbol{\alpha }}}_{x}}\) is considered uncertain (i.e., close to uniform distribution). By optimizing Eq. (8), our method automatically balances the empirical probability \(\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})\) and the number of observations \({n}_{{{\boldsymbol{\alpha }}}_{x}}\). Probability p(yαx) also serves as the confidence score for the logic rule αx y.

We then employ atom weight αw to model \(\hat{p}(y| {{\boldsymbol{\alpha }}}_{x})\) based on the contribution of each atom in αx. We adopt deep neural network as a consequent estimator that predicts αw, which improves generalization to unseen cases and enhances noise handling. The details about the deep neural network is in Section C.3 of the Supplementary Information. Given the chosen antecedent αx for the given input x, we define an arbitrary data sample in the dataset as xj X. The candidates set of atoms for xj is denoted by \({\mathcal{C}}({{\bf{x}}}^{j})\). Each atom candidate in \({\mathcal{C}}({{\bf{x}}}^{j})\) should satisfy both global and local constraints. The hard priors discussed in Section “Human Prior p(b|αx)” provide the global constraint, ensuring that the atom conforms to a human-defined logical form. The local constraint requires that xj satisfies the atom. An atom “awesome > 1”, for example, is sampled only if xj mentions “awesome” more than once. Next, uj is the vector that indicates whether each atom oi in αx is also included in the candidate set \({\mathcal{C}}({{\bf{x}}}^{j})\), i.e., \({u}_{i}^{j}={\mathbb{I}}({o}_{i}\in {\mathcal{C}}({{\bf{x}}}^{j}))\) where \({u}_{i}^{j}\) is i-th element of uj and \({\mathbb{I}}\) represents an indicator function. Additionally, we define the region \({{\mathcal{R}}}_{i}\) for the atom oi as the set of train data samples that satisfies atom oi, and we define \({\mathcal{R}}={{\mathcal{R}}}_{1}\cup \ldots \cup {{\mathcal{R}}}_{L}\) as the entire region of the antecedent αx. The deep model then predicts \({{\boldsymbol{\alpha }}}_{w}=\left(\begin{array}{ccc}{w}_{11}&\ldots &{w}_{1L}\\ \ldots &\ldots &\ldots \\ {w}_{K1}&\ldots &{w}_{K\,L}\end{array}\right)\) from αx, and minimizes the following loss objective.

$${{\mathcal{L}}}_{w}=\sum _{({{\bf{x}}}_{j},{y}_{j})\in {\mathcal{R}}}CrossEntropyLoss({{\boldsymbol{\alpha }}}_{w}^{T}{{\bf{u}}}^{j},{y}_{j})$$

(9)

This regression tests combinations of atoms in αx on train dataset, allowing the deep model to learn αw as the relationship between the prediction y and each atom, based on the human knowledge reflected in the labels. Since x naturally satisfies all atoms in αx, we calculate the sum of αw across the atoms to derive a logit for each class. We then apply a softmax function across classes to obtain \(\tilde{p}(y| {{\boldsymbol{\alpha }}}_{x})\), where \(\tilde{p}(y| {{\boldsymbol{\alpha }}}_{x})\) represents the predicted empirical probability.

$$\tilde{p}({y}| {{\boldsymbol{\alpha }}}_{x})=\frac{\exp \left({\sum }_{i\in [1,L]}{w}_{yi}\right)}{{\sum }_{k\in [1,K]}\exp \left({\sum }_{i\in [1,L]}{w}_{ki}\right)}$$

(10)

Subsequently, we use the multi-task learning framework in ref. 31 to train the neural network, ensuring that its predicted probability \(\hat{p}({y}| {{\boldsymbol{\alpha }}}_{x})\) aligns with the empirical probability. This alignment is achieved by minimizing the loss specified in the following equation.

$${{\mathcal{L}}}_{r}=\frac{1}{2{\sigma }_{p}^{2}}| | \hat{p}({y}| {\boldsymbol{\alpha }})-\tilde{p}({y}| {\boldsymbol{\alpha }})| {| }^{2}+\frac{1}{2{\sigma }_{n}^{2}}| | {n}_{{\boldsymbol{\alpha }}}-{\tilde{n}}_{{\boldsymbol{\alpha }}}| {| }^{2}+\log {\sigma }_{p}{\sigma }_{n},$$

(11)

where \({\tilde{n}}_{{\boldsymbol{\alpha }}}\) is the predicted coverage given by the neural model, and σp and σn are standard deviations of ground truth probability and coverage, respectively. Finally, we adjust two loss objective \({{\mathcal{L}}}_{r}\) and \({{\mathcal{L}}}_{w}\) with a hyperparameter λ for the training of neural consequent estimator.

$${{\mathcal{L}}}_{c}={{\mathcal{L}}}_{r}+\lambda {{\mathcal{L}}}_{w}$$

(12)

Deep antecedent generation p(αx)

Deep antecedent generation finds an antecedent αx for explanation by reshaping the given deep model f. Specifically, we replace the prediction layer in f of the backbone model with an explanation generator, so that the latent representation z of input x is mapped to an explanation, instead of directly mapping to a prediction (e.g., class label). We outline the generation process with a formal definition. First, we precompute the embedding of each atom by averaging the embeddings of all training instances that satisfy the atom. During both training and inference, the antecedent generator sequentially selects atoms to form an explanation. At each selection step, the input embedding z is combined with the embeddings of the previously selected atoms to form a latent representation h via an encoder. We compute a probability distribution over candidate atoms based on their similarity to h, and an atom is sampled from this distribution using the Gumbel-softmax trick, excluding already selected atoms. This process repeats until a predefined number of atoms is selected, forming the final antecedent.

Formally, given z, which is the representation of input x in the last hidden layer of f, we generate explanation α = (o1…, oL) with a recursive formulation. Note that this process has a complexity that is linear with L (Section “Optimization and Time Complexity”). Formally, given z and o1, …oi−1, we obtain atom oi by

$${{\bf{h}}}_{i}=Encoder([{\bf{z}};{{\bf{o}}}_{1}\ldots ;{{\bf{o}}}_{i-1}]),\quad p({o}_{i}| {\bf{x}},{o}_{1}\ldots ,{o}_{i-1})=\frac{{\mathbb{I}}({o}_{i}\in {\mathcal{C}}({\bf{x}}))\exp ({{\bf{h}}}_{i}^{T}{{\bf{o}}}_{i})}{{\sum }_{\tilde{o}}{\mathbb{I}}(\tilde{o}\in {\mathcal{C}}({\bf{x}}))\exp ({{\bf{h}}}_{i}^{T}\tilde{{\bf{o}}})}$$

(13)

where oi is the embedding of atom oi and Encoder is a neural sequence encoder such as GRU32 or Transformer33. \({\mathbb{I}}\) is the indicator function, and \({\mathcal{C}}({\bf{x}})\) is the set of atom candidates for x. Note that we set the probability of the atoms that do not satisfy global or local constraints to zero. This ensures that only the atoms satisfying the specified conditions will be chosen in the following sampling process. We then sample oi from p(oix, o1…, oi−1) in a differentiable way to ensure end-to-end training:

$${o}_{i}=Gumbel(p(\tilde{o}\in {\mathcal{C}}({\bf{x}})\subset {\mathcal{O}}\,| \,{\bf{x}},{o}_{1}\ldots ,{o}_{i-1})),\,\,\,p({{\boldsymbol{\alpha }}}_{x}| {\bf{x}})=\prod _{i\in [1,L]}p({o}_{i}| {\bf{x}},{o}_{1}\ldots ,{o}_{i-1})$$

(14)

Gumbel is Straight-Through Gumbel-Softmax34, a differentiable function for sampling discrete values. An atom oi is represented as a one-hot vector with a dimension of \(| {\mathcal{O}}|\), where \({\mathcal{O}}\) is the set of all atoms that satisfies hard priors. Then oi is multiplied with the embedding matrix of atoms to derive the embedding oi.

Optimization and time complexity

A deep logic rule reasoning model is learned in two steps. The first step optimizes the neural consequent estimator to learn p(yαx) by minimizing loss \({{\mathcal{L}}}_{c}\) in Eq. (12). The second step converts the deep model f to an explainable version by maximizing p(yx, b) in Eq. (6) with a cross-entropy loss. This is equivalent to minimizing loss \({{\mathcal{L}}}_{d}=-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}^{(s)})-\log p({y}^{* }| {{\boldsymbol{\alpha }}}_{x}^{(s)})\). Here, \(-{{\mathcal{L}}}_{b}({{\boldsymbol{\alpha }}}_{x}^{(s)})\) punishes explanations that do not fit human’s prior preference for rules, while \(\log p({y}^{* }| {{\boldsymbol{\alpha }}}_{x}^{(s)})\) finds an antecedent \({{\boldsymbol{\alpha }}}_{x}^{(s)}\) that leads to the ground-truth class y* with high confidence. We repeat the first step and the second step for every iteration of batch. For stable optimization, the parameters of the antecedent generator are frozen during the first step, and the parameters of the consequent estimator are frozen during the second step.

We analyze the per-sample time complexity of modules in our method. The complexity is \(O(L\cdot | {\mathcal{O}}| )\) for antecedent generation, \(O({L}^{2}+L| {\mathcal{R}}| )\) for neural consequent estimator. Therefore, the total time complexity is \(O(L| {\mathcal{O}}| +L| {\mathcal{R}}| )\) since \(L < < | {\mathcal{O}}|\) and \(L < < | {\mathcal{R}}|\).



Source link

Continue Reading

Trending