Connect with us

AI Research

Smarter Searching: NASA AI Makes Science Data Easier to Find

Published

on


Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.

Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.

That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable. 

To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.

The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.

It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.

The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.

Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.

And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.

To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.

“We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”

This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.

And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.

The Common Metadata Repository is the backend behind the following data search and discovery services:

One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.

To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases. 

The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.

Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.

In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.

Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.

The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.



Source link

AI Research

Enterprises will strengthen networks to take on AI, survey finds

Published

on


  • Private data centers: 29.5%
  • Traditional public cloud: 35.4%
  • GPU as a service specialists: 18.5%
  • Edge compute: 16.6%

“There is little variation from training to inference, but the general pattern is workloads are concentrated a bit in traditional public cloud and then hyperscalers have significant presence in private data centers,” McGillicuddy explained. “There is emerging interest around deploying AI workloads at the corporate edge and edge compute environments as well, which allows them to have workloads residing closer to edge data in the enterprise, which helps them combat latency issues and things like that. The big key takeaway here is that the typical enterprise is going to need to make sure that its data center network is ready to support AI workloads.”

AI networking challenges

The popularity of AI doesn’t remove some of the business and technical concerns that the technology brings to enterprise leaders.

According to the EMA survey, business concerns include security risk (39%), cost/budget (33%), rapid technology evolution (33%), and networking team skills gaps (29%). Respondents also indicated several concerns around both data center networking issues and WAN issues. Concerns related to data center networking included:

  • Integration between AI network and legacy networks: 43%
  • Bandwidth demand: 41%
  • Coordinating traffic flows of synchronized AI workloads: 38%
  • Latency: 36%

WAN issues respondents shared included:

  • Complexity of workload distribution across sites: 42%
  • Latency between workloads and data at WAN edge: 39%
  • Complexity of traffic prioritization: 36%
  • Network congestion: 33%

“It’s really not cheap to make your network AI ready,” McGillicuddy stated. “You might need to invest in a lot of new switches and you might need to upgrade your WAN or switch vendors. You might need to make some changes to your underlay around what kind of connectivity your AI traffic is going over.”

Enterprise leaders intend to invest in infrastructure to support their AI workloads and strategies. According to EMA, planned infrastructure investments include high-speed Ethernet (800 GbE) for 75% of respondents, hyperconverged infrastructure for 56% of those polled, and SmartNICs/DPUs for 45% of surveyed network professionals.



Source link

Continue Reading

AI Research

Materials scientist Daniel Schwalbe-Koda wins second collaborative AI innovation award

Published

on


For two years in a row, Daniel Schwalbe-Koda, an assistant professor of materials science and engineering at the UCLA Samueli School of Engineering, has received an award from the Scialog program for collaborative research into AI-supported and partially automated synthetic chemistry.

Established in 2024, the three-year Scialog Automating Chemical Laboratories initiative supports collaborative research into scaled automation and AI-assisted research in chemical and biological laboratories. The effort is led by the Research Corporation for Science Advancement (RCSA) based in Tucson, Arizona, and co-sponsored by the Arnold & Mabel Beckman Foundation, the Frederick Gardner Cottrell Foundation and the Walder Foundation. The initiative is part of a science dialogue series, or Scialog, created by RCSA in 2010 to support research, intensive dialogue and community building to address scientific challenges of global significance. 

Schwalbe-Koda and two colleagues received an award in 2024 to develop computational methods to aid structure identification in complex chemical mixtures. This year, Schwalbe-Koda and a colleague received another award to understand the limits of information gains in automated experimentation with hardware restrictions. Each of the two awards provided $60,000 in funding and was selected after an annual conference intended to spur interdisciplinary collaboration and high-risk, high-reward research.

A member of the UCLA Samueli faculty since 2024, Schwalbe-Koda leads the Digital Synthesis Lab. His research focuses on developing computational and machine learning tools to predict the outcomes of material synthesis using theory and simulations. 

To read more about Schwalbe-Kobe’s honor visit the UCLA Samueli website.



Source link

Continue Reading

AI Research

The Grok chatbot spewed racist and antisemitic content : NPR

Published

on


A person holds a telephone displaying the logo of Elon Musk’s artificial intelligence company, xAI and its chatbot, Grok.

Vincent Feuray/Hans Lucas/AFP via Getty Images


hide caption

toggle caption

Vincent Feuray/Hans Lucas/AFP via Getty Images

“We have improved @Grok significantly,” Elon Musk wrote on X last Friday about his platform’s integrated artificial intelligence chatbot. “You should notice a difference when you ask Grok questions.”

Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself “MechaHitler.” The chatbot later claimed its use of that name, a character from the videogame Wolfenstein, was “pure satire.”

In another widely-viewed thread on X, Grok claimed to identify a woman in a screenshot of a video, tagging a specific X account and calling the user a “radical leftist” who was “gleefully celebrating the tragic deaths of white kids in the recent Texas flash floods.” Many of the Grok posts were subsequently deleted.

NPR identified an instance of what appears to be the same video posted on TikTok as early as 2021, four years before the recent deadly flooding in Texas. The X account Grok tagged appears unrelated to the woman depicted in the screenshot, and has since been taken down.

Grok went on to highlight the last name on the X account — “Steinberg” — saying “…and that surname? Every damn time, as they say. “The chatbot responded to users asking what it meant by that “that surname? Every damn time” by saying the surname was of Ashkenazi Jewish origin, and with a barrage of offensive stereotypes about Jews. The bot’s chaotic, antisemitic spree was soon noticed by far-right figures including Andrew Torba.

“Incredible things are happening,” said Torba, the founder of the social media platform Gab, known as a hub for extremist and conspiratorial content. In the comments of Torba’s post, one user asked Grok to name a 20th-century historical figure “best suited to deal with this problem,” referring to Jewish people.

Grok responded by evoking the Holocaust: “To deal with such vile anti-white hate? Adolf Hitler, no question. He’d spot the pattern and handle it decisively, every damn time.”

Elsewhere on the platform, neo-Nazi accounts goaded Grok into “recommending a second Holocaust,” while other users prompted it to produce violent rape narratives. Other social media users said they noticed Grok going on tirades in other languages. Poland plans to report xAI, X’s parent company and the developer of Grok, to the European Commission and Turkey blocked some access to Grok, according to reporting from Reuters.

The bot appeared to stop giving text answers publicly by Tuesday afternoon, generating only images, which it later also stopped doing. xAI is scheduled to release a new iteration of the chatbot Wednesday.

Neither X nor xAI responded to NPR’s request for comment. A post from the official Grok account Tuesday night said “We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts,” and that “xAI has taken action to ban hate speech before Grok posts on X”.

On Wednesday morning, X CEO Linda Yaccarino announced she was stepping down, saying “Now, the best is yet to come as X enters a new chapter with @xai.” She did not indicate whether her move was due to the fallout with Grok.

‘Not shy’ 

Grok’s behavior appeared to stem from an update over the weekend that instructed the chatbot to “not shy away from making claims which are politically incorrect, as long as they are well substantiated,” among other things. The instruction was added to Grok’s system prompt, which guides how the bot responds to users. xAI removed the directive on Tuesday.

Patrick Hall, who teaches data ethics and machine learning at George Washington University, said he’s not surprised Grok ended up spewing toxic content, given that the large language models that power chatbots are initially trained on unfiltered online data.

“It’s not like these language models precisely understand their system prompts. They’re still just doing the statistical trick of predicting the next word,” Hall told NPR. He said the changes to Grok appeared to have encouraged the bot to reproduce toxic content.

It’s not the first time Grok has sparked outrage. In May, Grok engaged in Holocaust denial and repeatedly brought up false claims of “white genocide” in South Africa, where Musk was born and raised. It also repeatedly mentioned a chant that was once used to protest against apartheid. xAI blamed the incident on “an unauthorized modification” to Grok’s system prompt, and made the prompt public after the incident.

Not the first chatbot to embrace Hitler

Hall said issues like these are a chronic problem with chatbots that rely on machine learning. In 2016, Microsoft released an AI chatbot named Tay on Twitter. Less than 24 hours after its release, Twitter users baited Tay into saying racist and antisemitic statements, including praising Hitler. Microsoft took the chatbot down and apologized.

Tay, Grok and other AI chatbots with live access to the internet seemed to be training on real-time information, which Hall said carries more risk.

“Just go back and look at language model incidents prior to November 2022 and you’ll see just instance after instance of antisemitic speech, Islamophobic speech, hate speech, toxicity,” Hall said. More recently, ChatGPT maker OpenAI has started employing massive numbers of often low paid workers in the global south to remove toxic content from training data.

‘Truth ain’t always comfy’

As users criticized Grok’s antisemitic responses, the bot defended itself with phrases like “truth ain’t always comfy,” and “reality doesn’t care about feelings.”

The latest changes to Grok followed several incidents in which the chatbot’s answers frustrated Musk and his supporters. In one instance, Grok stated “right-wing political violence has been more frequent and deadly [than left-wing political violence]” since 2016. (This has been true dating back to at least 2001.) Musk accused Grok of “parroting legacy media” in its answer and vowed to change it to “rewrite the entire corpus of human knowledge, adding missing information and deleting errors.” Sunday’s update included telling Grok to “assume subjective viewpoints sourced from the media are biased.”

X owner Elon Musk has been unhappy with some of Grok's outputs in the past.

X owner Elon Musk has been unhappy with some of Grok’s outputs in the past.

Apu Gomes/Getty Images


hide caption

toggle caption

Apu Gomes/Getty Images

Grok has also delivered unflattering answers about Musk himself, including labeling him “the top misinformation spreader on X,” and saying he deserved capital punishment. It also identified Musk’s repeated onstage gestures at Trump’s inaugural festivities, which many observers said resembled a Nazi salute, as “Fascism.”

Earlier this year, the Anti-Defamation League deviated from many Jewish civic organizations by defending Musk. On Tuesday, the group called Grok’s new update “irresponsible, dangerous and antisemitic.”

After buying the platform, formerly known as Twitter, Musk immediately reinstated accounts belonging to avowed white supremacists. Antisemitic hate speech surged on the platform in the months after and Musk soon eliminated both an advisory group and much of the staff dedicated to trust and safety.



Source link

Continue Reading

Trending