AI Research
Can AI optimize building retrofits? Research shows promise in CO₂ reduction but gaps in economic reasoning

Researchers from Michigan State University have conducted one of the first systematic evaluations of large language models (LLMs) in the domain of building energy retrofits, where decisions on upgrades such as insulation, heat pumps, and electrification can directly impact energy savings and carbon reduction.
The study, titled “Can AI Make Energy Retrofit Decisions? An Evaluation of Large Language Models,” published on arXiv, examines whether LLMs can reliably guide retrofit decision-making across diverse U.S. housing stock. It addresses the limitations of conventional methods, which are often too technical, data-heavy, or opaque for practical adoption, particularly at large scale.
How accurate are AI models in selecting retrofit measures?
The researchers tested seven widely used LLMs, ChatGPT o1, ChatGPT o3, DeepSeek R1, Grok 3, Gemini 2.0, Llama 3.2, and Claude 3.7, on a dataset of 400 homes drawn from 49 states. Each home profile included details such as construction vintage, floor area, insulation levels, heating and cooling systems, and occupant patterns. The models were asked to recommend retrofit measures under two separate objectives: maximizing carbon dioxide reduction (technical context) and minimizing payback period (sociotechnical context).
The analysis found that LLMs were able to deliver effective results in technical optimization tasks. Accuracy reached 54.5 percent when looking at the single best solution and as high as 92.8 percent when top five matches were considered, even without fine-tuning. This reflects the models’ ability to align with physics-based benchmarks in scenarios where clear engineering goals, such as cutting carbon emissions, are prioritized.
On the other hand, when the focus shifted to minimizing payback period, results weakened substantially. Top-1 accuracy fell as low as 6.5 percent in some models, with only Gemini 2.0 surpassing 50 percent at the broader Top-5 threshold. The study concludes that economic trade-offs, which require balancing upfront investment against long-term savings, remain difficult for LLMs to interpret accurately.
How consistent and reliable are AI-generated decisions?
The study also examined whether different LLMs converged on the same recommendations. Here, performance was less encouraging. Consistency between models was low, and in some cases their agreement was worse than chance. Interestingly, the models that performed best in terms of accuracy, such as ChatGPT o3 and Gemini 2.0, were also the ones most likely to diverge from other systems. This indicates that while some models may excel, they do not necessarily produce results that align with peers, creating challenges for standardization in real-world applications.
The findings underscore the difficulty of relying on AI for high-stakes energy decisions when consensus is lacking. In practice, building owners, policymakers, and utility companies require not just accurate but also consistent recommendations. Low inter-model reliability highlights the importance of developing frameworks that validate and harmonize AI outputs before they can be integrated into large-scale retrofit programs.
What shapes AI reasoning in retrofit decisions?
The researchers also explored how LLMs arrive at their decisions. Sensitivity analysis showed that most models, like physics-based baselines, prioritized location and building geometry. Variables such as county, state, and floor space were consistently weighted as the most influential factors. However, the models paid less attention to occupant behaviors and technology choices, even though these can be critical in shaping real-world outcomes.
The reasoning patterns offered further insight. Among the tested systems, ChatGPT o3 and DeepSeek R1 provided the most structured, step-by-step explanations. Their workflows followed an engineering-like logic, beginning with baseline energy assumptions, adjusting for envelope improvements, calculating system efficiency, incorporating appliance impacts, and finally comparing outcomes. Yet, while the logic mirrored engineering principles, it was often simplified, overlooking nuanced contextual dependencies such as occupant usage levels or detailed climate variations.
The authors also noted that prompt design played a key role in outcomes. Slight adjustments in how questions were phrased could significantly shift model reasoning. For example, if not explicitly instructed to consider both upfront cost and energy savings, some models defaulted to choosing the lowest-cost option when evaluating payback. This sensitivity suggests that successful deployment of AI in retrofit contexts will depend heavily on careful prompt engineering and domain-specific adaptation.
A cautious but forward-looking conclusion
The evaluation highlights both the promise and the limitations of current LLMs in building energy retrofits. On one hand, the ability to achieve near 93 percent alignment with top retrofit measures in technical contexts shows significant potential for AI to streamline decision-making and improve energy efficiency strategies. On the other, weak performance in sociotechnical trade-offs, low inter-model consistency, and simplified reasoning demonstrate that these tools are not yet ready to replace domain expertise.
To sum up, LLMs can complement but not substitute traditional methods and expert judgment in retrofit planning. They recommend further development of domain-specific models, fine-tuning with validated datasets, and hybrid approaches that integrate AI with physics-based simulations to ensure accuracy and traceability.
For policymakers and practitioners, the study provides an important benchmark: AI can indeed assist in advancing retrofit strategies, especially for carbon reduction, but its current shortcomings demand careful oversight. As cities and communities push toward energy transition goals, ensuring that AI systems are transparent, consistent, and context-aware will be essential before they can be deployed at scale.
AI Research
Albania’s prime minister appoints an AI-generated ‘minister’ to tackle corruption

TIRANA, Albania — Albania’s prime minister on Friday tapped an Artificial Intelligence-generated “minister” to tackle corruption and promote transparency and innovation in his new Cabinet.
Officially named Diella — the female form of the word for sun in the Albanian language — the new AI minister is a virtual entity.
Diella will be a “member of the Cabinet who is not present physically but has been created virtually,” Prime Minister Edi Rama said in a post on Facebook.
Rama said the AI-generated bot would help ensure that “public tenders will be 100% free of corruption” and will help the government work faster and with full transparency.
Diella uses AI’s up-to-date models and techniques to guarantee accuracy in offering the duties it is charged with, according to Albania’s National Agency for Information Society’s website.
Diella, depicted as a figure in a traditional Albanian folk costume, was created earlier this year, in cooperation with Microsoft, as a virtual assistant on the e-Albania public service platform, where she has helped users navigate the site and get access to about 1 million digital inquiries and documents.
Rama’s Socialist Party secured a fourth consecutive term after winning 83 of the 140 Assembly seats in the May 11 parliamentary elections. The party can govern alone and pass most legislation, but it needs a two-thirds majority, or 93 seats, to change the Constitution.
The Socialists have said it can deliver European Union membership for Albania in five years, with negotiations concluding by 2027. The pledge has been met with skepticism by the Democrats, who contend Albania is far from prepared.
The Western Balkan country opened full negotiations to join the EU a year ago. The new government also faces the challenges of fighting organized crime and corruption, which has remained a top issue in Albania since the fall of the communist regime in 1990.
Diella also will help local authorities to speed up and adapt to the bloc’s working trend.
Albanian President Bajram Begaj has mandated Rama with the formation of the new government. Analysts say that gives the prime minister authority “for the creation and functioning” of AI-generated Diella.
Asked by journalists whether that violates the constitution, Begaj stopped short on Friday of describing Diella’s role as a ministerial post.
The conservative opposition Democratic Party-led coalition, headed by former prime minister and president Sali Berisha, won 50 seats. The party has not accepted the official election results, claiming irregularities, but its members participated in the new parliament’s inaugural session. The remaining seats went to four smaller parties.
Lawmakers will vote on the new Cabinet but it was unclear whether Rama will ask for a vote on Diella’s virtual post. Legal experts say more work may be needed to establish Diella’s official status.
The Democrats’ parliamentary group leader Gazmend Bardhi said he considered Diella’s ministerial status unconstitutional.
“Prime minister’s buffoonery cannot be turned into legal acts of the Albanian state,” Bardhi posted on Facebook.
Parliament began the process on Friday to swear in the new lawmakers, who will later elect a new speaker and deputies and formally present Rama’s new Cabinet.
AI Research
AI fuels false claims after Charlie Kirk’s death, CBS News analysis reveals

False claims, conspiracy theories and posts naming people with no connection to the incident spread rapidly across social media in the aftermath of conservative activist Charlie Kirk’s killing on Wednesday, some amplified and fueled by AI tools.
CBS News identified 10 posts by Grok, X’s AI chatbot, that misidentified the suspect before his identity, now known to be southern Utah resident Tyler Robinson, was released. Grok eventually generated a response saying it had incorrectly identified the suspect, but by then, posts featuring the wrong person’s face and name were already circulating across X.
The chatbot also generated altered “enhancements” of photos released by the FBI. One such photo was reposted by the Washington County Sheriff’s Office in Utah, which later posted an update saying, “this appears to be an AI enhanced photo” that distorted the clothing and facial features.
One AI-enhanced image portrayed a man appearing much older than Robinson, who is 22. An AI-generated video that smoothed out the suspect’s features and jumbled his shirt design was posted by an X user with more than 2 million followers and was reposted thousands of times.
On Friday morning, after Utah Gov. Spencer Cox announced that the suspect in custody was Robinson, Grok’s replies to X users’ inquiries about him were contradictory. One Grok post said Robinson was a registered Republican, while others reported he was a nonpartisan voter. Voter registration records indicate Robinson is not affiliated with a political party.
CBS News also identified a dozen instances where Grok said that Kirk was alive the day following his death. Other Grok responses gave a false assassination date, labeled the FBI’s reward offer a “hoax” and said that reports about Kirk’s death “remain conflicting” even after his death had been confirmed.
Most generative AI tools produce results based on probability, which can make it challenging for them to provide accurate information in real time as events unfold, S. Shyam Sundar, a professor at Penn State University and the director of the university’s Center for Socially Responsible Artificial Intelligence, told CBS News.
“They look at what is the most likely next word or next passage,” Sundar said. “It’s not based on fact checking. It’s not based on any kind of reportage on the scene. It’s more based on the likelihood of this event occurring, and if there’s enough out there that might question his death, it might pick up on some of that.”
X did not respond to a request for comment about the false information Grok was posting.
Meanwhile, the AI-powered search engine Perplexity’s X bot described the shooting as a “hypothetical scenario” in a since-deleted post, and suggested a White House statement on Kirk’s death was fabricated.
Perplexity’s spokesperson told CBS News that “accurate AI is the core technology we are building and central to the experience in all of our products,” but that “Perplexity never claims to be 100% accurate.”
Another spokesperson added the X bot is not up to date with improvements the company has made to its technology, and the company has since removed the bot from X.
Google’s AI Overview, a summary of search results that sometimes appears at the top of searches, also provided inaccurate information. The AI Overview for a search late Thursday evening for Hunter Kozak, the last person to ask Kirk a question before he was killed, incorrectly identified him as the person of interest the FBI was looking for. By Friday morning, the false information no longer appeared for the same search.
“The vast majority of the queries seeking information on this topic return high quality and accurate responses,” a Google spokesperson told CBS News. “Given the rapidly evolving nature of this news, it’s possible that our systems misinterpreted web content or missed some context, as all Search features can do given the scale of the open web.”
Sundar told CBS News that people tend to perceive AI as being less biased or more reliable than someone online who they don’t know.
“We don’t think of machines as being partisan or bias or wanting to sow seeds of dissent,” Sundar said. “If it’s just a social media friend or some somebody on the contact list that’s sent something on your feed with unknown pedigree … chances are people trust the machine more than they do the random human.”
Misinformation may also be coming from foreign sources, according to Cox, Utah’s governor, who said in a press briefing on Thursday that foreign adversaries including Russia and China have bots that “are trying to instill disinformation and encourage violence.” Cox urged listeners to spend less time on social media.
“I would encourage you to ignore those and turn off those streams, and to spend a little more time with our families,” he said.
AI Research
Senator Cruz Unveils AI Framework and Regulatory Sandbox Bill

On September 10, Senate Commerce, Science, and Transportation Committee Chair Ted Cruz (R-TX) released what he called a “light-touch” regulatory framework for federal AI legislation, outlining five pillars for advancing American AI leadership. In parallel, Senator Cruz introduced the Strengthening AI Normalization and Diffusion by Oversight and eXperimentation (“SANDBOX”) Act (S. 2750), which would establish a federal AI regulatory sandbox program that would waive or modify federal agency regulations and guidance for AI developers and deployers. Collectively, the AI framework and the SANDBOX Act mark the first congressional effort to implement the recommendations of AI Action Plan the Trump Administration released on July 23.
- Light-Touch AI Regulatory Framework
Senator Cruz’s AI framework, titled “A Legislative Framework for American Leadership in Artificial Intelligence,” calls for the United States to “embrace its history of entrepreneurial freedom and technological innovation” by adopting AI legislation that promotes innovation while preventing “nefarious uses” of AI technology. Echoing President Trump’s January 23 Executive Order on “Removing Barriers to American Leadership in Artificial Intelligence” and recommendations in the AI Action Plan, the AI framework sets out five pillars as a “starting point for discussion”:
- Unleashing American Innovation and Long-Term Growth. The AI framework recommends that Congress establish a federal AI regulatory sandbox program, provide access to federal datasets for AI training, and streamline AI infrastructure permitting. This pillar mirrors the priorities of the AI Action Plan and President Trump’s July 23 Executive Order on “Accelerating Federal Permitting of Data Center Infrastructure.”
- Protecting Free Speech in the Age of AI. Consistent with President Trump’s July 23 Executive Order on “Preventing Woke AI in the Federal Government,” Senator Cruz called on Congress to “stop government censorship” of AI (“jawboning”) and address foreign censorship of Americans on AI platforms. Additionally, while the AI Action Plan recommended revising the National Institute of Standards & Technology (“NIST”)’s AI Risk Management Framework to “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change,” this pillar calls for reforming NIST’s “AI priorities and goals.”
- Prevent a Patchwork of Burdensome AI Regulation. Following a failed attempt by Congressional Republicans to enact a moratorium on the enforcement of state and local AI regulations in July, the AI Action Plan called on federal agencies to limit federal AI-related funding to states with burdensome AI regulatory regimes and on the FCC to review state AI laws that may be preempted under the Communications Act. Similarly, the AI framework calls on Congress to enact federal standards to prevent burdensome state AI regulation, while also countering “excessive foreign regulation” of Americans.
- Stop Nefarious Uses of AI Against Americans. In a nod to bipartisan support for state digital replica protections – which ultimately doomed Congress’s state AI moratorium this summer – this pillar calls on Congress to protect Americans against digital impersonation scams and fraud. Additionally, this pillar calls on Congress to expand the principles of the federal TAKE IT DOWN Act, signed into law in May, to safeguard American schoolchildren from nonconsensual intimate visual depictions.
- Defend Human Value and Dignity. This pillar appears to expand on the policy of U.S. “global AI dominance in order to promote human flourishing” established by President Trump’s January 23 Executive Order by calling on Congress to reinvigorate “bioethical considerations” in federal policy and to “oppose AI-driven eugenics and other threats.”
- SANDBOX Act
Consistent with recommendations in the AI Action Plan and AI Framework, the SANDBOX Act would direct the White House Office of Science & Technology Policy (“OSTP”) to establish and operate an “AI regulatory sandbox program” with the purpose of incentivizing AI innovation, the development of AI products and services, and the expansion of AI-related economic opportunities and jobs. According to Senator Cruz’s press release, the SANDBOX Act marks a “first step” in implementing the AI Action Plan, which called for “regulatory sandboxes or AI Centers of Excellence around the country where researchers, startups, and established enterprises can rapidly deploy and test AI tools.”
Program Applications. The AI regulatory sandbox program would allow U.S. companies and individuals, or the OSTP Director, to apply for a “waiver or modification” of one or more federal agency regulations in order to “test, experiment, or temporarily provide” AI products, AI services, or AI development methods. Applications must include various categories of information, including:
- Contact and business information,
- A description of the AI product, service, or development method,
- Specific regulation(s) that the applicant seeks to have waived or modified and why such waiver or modification is needed,
- Consumer benefits, business operational efficiencies, economic opportunities, jobs, and innovation benefits of the AI product, service, or development method,
- Reasonably foreseeable risks to health and safety, the economy, and consumers associated with the waiver or modification, and planned risk mitigations,
- The requested time period for the waiver or modification, and
- Each agency with jurisdiction over the AI product, service, or development method.
Agency Reviews and Approvals. The bill would require OSTP to submit applications to federal agencies with jurisdiction over the AI product, service, or development method within 14 days. In reviewing AI sandbox program applications, federal agencies would be required to solicit input from the private sector and technical experts on whether the applicant’s plan would benefit consumers, businesses, the economic, or AI innovation, and whether potential benefits outweigh health and safety, economic, or consumer risks. Agencies would be required to approve or deny applications within 90 days, with a record documenting reasonably foreseeable risks, the mitigations and consumer protections that justify agency approval, or the reasons for agency denial. Denied applicants would be authorized to appeal to OSTP for reconsideration. Approved waivers or modifications would be granted for a term of two years, with up to four additional two-year terms if requested by the applicant and approved by OSTP.
Participant Terms and Requirements. Participants with approved waivers or modifications would be immune from federal criminal, civil, or agency enforcement of the waived or modified regulations, but would remain subject to private consumer rights of action. Additionally, participants would be required to report incidents of harm to health and safety, economic damage, or unfair or deceptive trade practices to OSTP and federal agencies within 72 hours after the incident occurs, and to make various disclosures to consumers. Participants would also be required to submit recurring reports to OSTP throughout the term of the waiver or modification, which must include the number of consumers affected, likely risks and mitigations, any unanticipated risks that arise during deployment, adverse incidents, and the benefits of the waiver or modification.
Congressional Review. Finally, the SANDBOX Act would require the OSTP Director to submit to Congress any regulations that the Director recommends for amendment or repeal “as a result of persons being able to operate safely” without those regulations under the sandbox program. The bill would establish a fast-track procedure for joint resolutions approving such recommendations, which, if enacted, would immediately repeal the regulations or adopt the amendments recommended by OSTP.
The SANDBOX Act’s regulatory sandbox program would sunset in 12 years unless renewed. The introduction of the SANDBOX Act comes as states have pursued their own AI regulatory sandbox programs – including a sandbox program established under the Texas Responsible AI Governance Act (“TRAIGA”), enacted in June, and an “AI Learning Laboratory Program” established under Utah’s 2024 AI Policy Act. The SANDBOX Act would require OSTP to share information these state AI sandbox programs if they are “similar or comparable” to the SANDBOX Act, in addition to coordinating reviews and accepting “joint applications” for participants with AI projects that would benefit from “both Federal and State regulatory relief.”
-
Business2 weeks ago
The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial
-
Tools & Platforms1 month ago
Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks
-
Ethics & Policy2 months ago
SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية
-
Events & Conferences4 months ago
Journey to 1000 models: Scaling Instagram’s recommendation system
-
Jobs & Careers2 months ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Podcasts & Talks2 months ago
Happy 4th of July! 🎆 Made with Veo 3 in Gemini
-
Education2 months ago
Macron says UK and France have duty to tackle illegal migration ‘with humanity, solidarity and firmness’ – UK politics live | Politics
-
Education2 months ago
VEX Robotics launches AI-powered classroom robotics system
-
Funding & Business2 months ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Podcasts & Talks2 months ago
OpenAI 🤝 @teamganassi