Connect with us

AI Research

RenderFormer: How neural networks are reshaping 3D rendering

Published

on


3D rendering—the process of converting three-dimensional models into two-dimensional images—is a foundational technology in computer graphics, widely used across gaming, film, virtual reality, and architectural visualization. Traditionally, this process has depended on physics-based techniques like ray tracing and rasterization, which simulate light behavior through mathematical formulas and expert-designed models.

Now, thanks to advances in AI, especially neural networks, researchers are beginning to replace these conventional approaches with machine learning (ML). This shift is giving rise to a new field known as neural rendering.

Neural rendering combines deep learning with traditional graphics techniques, allowing models to simulate complex light transport without explicitly modeling physical optics. This approach offers significant advantages: it eliminates the need for handcrafted rules, supports end-to-end training, and can be optimized for specific tasks. Yet, most current neural rendering methods rely on 2D image inputs, lack support for raw 3D geometry and material data, and often require retraining for each new scene—limiting their generalizability.

RenderFormer: Toward a general-purpose neural rendering model

To overcome these limitations, researchers at Microsoft Research have developed RenderFormer, a new neural architecture designed to support full-featured 3D rendering using only ML—no traditional graphics computation required. RenderFormer is the first model to demonstrate that a neural network can learn a complete graphics rendering pipeline, including support for arbitrary 3D scenes and global illumination, without relying on ray tracing or rasterization. This work has been accepted at SIGGRAPH 2025 and is open-sourced on GitHub (opens in new tab).

Architecture overview

As shown in Figure 1, RenderFormer represents the entire 3D scene using triangle tokens—each one encoding spatial position, surface normal, and physical material properties such as diffuse color, specular color, and roughness. Lighting is also modeled as triangle tokens, with emission values indicating intensity.

Figure 1: The figure illustrates the architecture of RenderFormer. It includes a Triangle Mesh Scene with a 3D rabbit model inside a colored cube, a Camera Ray Map grid, a View Independent Transformer (12 layers of Self-Attention and Feed Forward Network), a View Dependent Transformer (6 layers with Cross-Attention and Self-Attention), and a DPT Decoder. Scene attributes—Vertex Normal, Reflectance (Diffuse, Specular, Roughness), Emission, and Position—are embedded into Triangle Tokens via Linear + Norm operations. These tokens and Ray Bundle Tokens (from the Camera Ray Map) are processed by the respective transformers and decoded to produce a rendered image of a glossy rabbit in a colored room.
Figure 1. Architecture of RenderFormer

To describe the viewing direction, the model uses ray bundle tokens derived from a ray map—each pixel in the output image corresponds to one of these rays. To improve computational efficiency, pixels are grouped into rectangular blocks, with all rays in a block processed together.

The model outputs a set of tokens that are decoded into image pixels, completing the rendering process entirely within the neural network.

PODCAST SERIES

AI Testing and Evaluation: Learnings from Science and Industry

Discover how Microsoft is learning from other domains to advance evaluation and testing as a pillar of AI governance.


Dual-branch design for view-independent and view-dependent effects

The RenderFormer architecture is built around two transformers: one for view-independent features and another for view-dependent ones.

  • The view-independent transformer captures scene information unrelated to viewpoint, such as shadowing and diffuse light transport, using self-attention between triangle tokens.
  • The view-dependent transformer models effects like visibility, reflections, and specular highlights through cross-attention between triangle and ray bundle tokens.

Additional image-space effects, such as anti-aliasing and screen-space reflections, are handled via self-attention among ray bundle tokens.

To validate the architecture, the team conducted ablation studies and visual analyses, confirming the importance of each component in the rendering pipeline.

Table 1: A table comparing the performance of different network variants in an ablation study. The columns are labeled Variant, PSNR (↑), SSIM (↑), LPIPS (↓), and FLIP (↓). Variants include configurations such as
Table 1. Ablation study analyzing the impact of different components and attention mechanisms on the final performance of the trained network.

To test the capabilities of the view-independent transformer, researchers trained a decoder to produce diffuse-only renderings. The results, shown in Figure 2, demonstrate that the model can accurately simulate shadows and other indirect lighting effects.

Figure 2: The figure displays four 3D-rendered objects showcasing view-independent rendering effects. From left to right: a purple teapot on a green surface, a blue rectangular object on a red surface, an upside-down table casting shadows on a green surface, and a green apple-like object on a blue surface. Each object features diffuse lighting and coarse shadow effects, with distinct highlights and shadows produced by directional light sources.
Figure 2. View-independent rendering effects decoded directly from the view-independent transformer, including diffuse lighting and coarse shadow effects.

The view-dependent transformer was evaluated through attention visualizations. For example, in Figure 3, the attention map reveals a pixel on a teapot attending to its surface triangle and to a nearby wall—capturing the effect of specular reflection. These visualizations also show how material changes influence the sharpness and intensity of reflections.

Figure 3: The figure contains six panels arranged in two rows and three columns. The top row displays a teapot in a room with red and green walls under three different roughness values: 0.3, 0.7, and 0.99 (left to right). The bottom row shows the corresponding attention outputs for each roughness setting, featuring the teapot silhouette against a dark background with distinct light patterns that vary with roughness.
Figure 3. Visualization of attention outputs

Training methodology and dataset design

RenderFormer was trained using the Objaverse dataset, a collection of more than 800,000 annotated 3D objects that is designed to advance research in 3D modeling, computer vision, and related fields. The researchers designed four scene templates, populating each with 1–3 randomly selected objects and materials. Scenes were rendered in high dynamic range (HDR) using Blender’s Cycles renderer, under varied lighting conditions and camera angles.

The base model, consisting of 205 million parameters, was trained in two phases using the AdamW optimizer:

  • 500,000 steps at 256×256 resolution with up to 1,536 triangles
  • 100,000 steps at 512×512 resolution with up to 4,096 triangles

The model supports arbitrary triangle-based input and generalizes well to complex real-world scenes. As shown in Figure 4, it accurately reproduces shadows, diffuse shading, and specular highlights.

Figure 4: The figure presents a 3×3 grid of diverse 3D scenes rendered by RenderFormer. In the top row, the first scene shows a room with red, green, and white walls containing two rectangular prisms; the second features a metallic tree-like structure in a blue-walled room with a reflective floor; and the third depicts a red animal figure, a black abstract shape, and a multi-faceted sphere in a purple container on a yellow surface. The middle row includes three constant width bodies (black, red, and blue) floating above a colorful checkered floor; a green shader ball with a square cavity inside a gray-walled room; and crystal-like structures in green, purple, and red on a reflective surface. The bottom row showcases a low-poly fox near a pink tree emitting particles on grassy terrain; a golden horse statue beside a heart-shaped object split into red and grey halves on a reflective surface; and a wicker basket, a banana and a bottle placed on a white platform.
Figure 4. Rendered results of different 3D scenes generated by RenderFormer

RenderFormer can also generate continuous video by rendering individual frames, thanks to its ability to model viewpoint changes and scene dynamics.

Looking ahead: Opportunities and challenges

RenderFormer represents a significant step forward for neural rendering. It demonstrates that deep learning can replicate and potentially replace the traditional rendering pipeline, supporting arbitrary 3D inputs and realistic global illumination—all without any hand-coded graphics computations.

However, key challenges remain. Scaling to larger and more complex scenes with intricate geometry, advanced materials, and diverse lighting conditions will require further research. Still, the transformer-based architecture provides a solid foundation for future integration with broader AI systems, including video generation, image synthesis, robotics, and embodied AI. 

Researchers hope that RenderFormer will serve as a building block for future breakthroughs in both graphics and AI, opening new possibilities for visual computing and intelligent environments.





Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

YSU: Grant puts YSU at forefront of AI research – WFMJ.com

Published

on



YSU: Grant puts YSU at forefront of AI research  WFMJ.com



Source link

Continue Reading

AI Research

Advarra launches AI- and data-backed study design solution to improve operational efficiency in clinical trials

Published

on


Advarra, the market leader in regulatory reviews and a leading provider of clinical research technology, today announced the launch of its Study Design solution, which uses AI- and data-driven insights to help life sciences companies design protocols for greater operational efficiency in the real world.

Study Design solution evaluates a protocol’s feasibility by comparing it to similar trials using Braid™, Advarra’s newly launched data and AI engine. Braid is powered by a uniquely rich set of digitized protocol-related documents and operational data from over 30,000 historical studies conducted by 3,500 sponsors. Drawing on Advarra’s institutional review board (IRB) and clinical trial systems, this dataset spans diverse trial types and therapeutic areas, provides granular detail on schedules of assessment, and tracks longitudinal study modifications, giving sponsors deeper insights than solutions based only on in-house or public datasets. 

“Too often, clinical trial protocols are developed without the benefit of robust comparative intelligence, leading to inefficient designs and operations,” said Laura Russell, senior vice president, head of data and AI product development at Advarra. “By drawing on the industry’s largest and richest operational dataset, Advarra’s Study Design solution delivers deeper insights into the feasibility of a protocol’s design. It helps sponsors better anticipate downstream operational challenges, make more informed decisions to simplify trial designs, and accelerate protocol development timelines.”

Advarra’s Study Design solution can be used to optimize a protocol prior to final submission or for retrospective analyses. The solution provides insights on design factors that drive operational feasibility, such as the impact of eligibility criteria, burdensomeness of the schedule of assessment on sites and participants, and reasons for amendments. Study teams receive custom benchmarking that allows for operational risk assessments through tailored data visualizations and consultations with Advarra’s data and study design experts. Technical teams can work directly within Advarra’s secure, self-service insights workspace to explore operational data for the purpose of powering internal analyses, models, and business intelligence tools.

“Early pilots have already demonstrated measurable impact,” added Russell. “In one engagement, benchmarking a sponsor’s protocol against comparable studies revealed twice as many exclusion criteria and 60 percent more site visits than industry benchmarks. With these insights, the sponsor saw a path to streamline future trial designs by removing unnecessary criteria, clustering procedures, and adopting hybrid visit models, ultimately reducing site burden and making participation easier for patients.”

Study Design solution is the first in a series of offerings by Advarra that will be powered by Braid. Future applications will extend insights beyond protocol design to improve study startup, enhance collaboration, and better support sites.

To learn more about Study Design solution or to request a consultation, visit advarra.com/study-design.

About Advarra
Advarra breaks the silos that impede clinical research, aligning patients, sites, sponsors, and CROs in a connected ecosystem to accelerate trials. Advarra is number one in research review services, a leader in site and sponsor technology, and is trusted by the top 50 global biopharma sponsors, top 20 CROs, and 50,000 site investigators worldwide. Advarra solutions enable collaboration, transparency, and speed to optimize trial operations, ensure patient safety and engagement, and reimagine clinical research while improving compliance. For more information, visit advarra.com.

 



Source link

Continue Reading

AI Research

Best Artificial Intelligence (AI) Stock to Buy Now: Nvidia or Palantir?

Published

on


Palantir has outperformed Nvidia so far this year, but investors shouldn’t ignore the chipmaker’s valuation.

Artificial intelligence (AI) investing is a remarkably broad field, as there are numerous ways to profit from this trend. Two of the most popular are Nvidia (NVDA -1.55%) and Palantir (PLTR -0.58%), which represent two different sides of AI investing.

Nvidia is on the hardware side, while Palantir produces AI software. These are two lucrative fields to invest in, but is there a clear-cut winner? Let’s find out.

Image source: Getty Images.

Palantir’s business model is more sustainable

Nvidia manufactures graphics processing units (GPUs), which have become the preferred computing hardware for processing AI workloads. While Nvidia has made a ton of money selling GPUs, it’s not done yet. Nvidia expects the big four AI hyperscalers to spend around $600 billion in data center capital expenditures this year, but projects that global data center capital expenditures will increase to $3 trillion to $4 trillion by 2030. That’s a major spending boom, and Nvidia will reap a substantial amount of money from that rise.

However, Nvidia isn’t completely safe. Its GPUs could fall out of style with AI hyperscalers as they develop in-house AI processing chips that could steal some of Nvidia’s market share. Furthermore, if demand for computing equipment diminishes, Nvidia’s revenue streams could fall. That’s why a subscription model like Palantir is a better business over the long term.

Palantir develops AI software that can be described as “data in, insights out.” By using AI to process a ton of information rapidly, Palantir can provide real-time insights for what those with decision-making authority should do. Furthermore, it also gives developers the power to deploy AI agents, which can act autonomously within a business.

Palantir sells its software to commercial clients and government entities, and has gathered a sizable customer base, although that figure is rapidly expanding. As the AI boom continues, these customers will likely stick with Palantir because it’s incredibly difficult to move away from the software once it has been deployed. This means that after the AI spending boom is complete, Palantir will still be able to generate continuous revenue from its software subscriptions.

This gives Palantir a business advantage.

Nvidia is growing faster

Although Palantir’s revenue growth is accelerating, it’s still slower than Nvidia’s.

NVDA Revenue (Quarterly YoY Growth) Chart

NVDA Revenue (Quarterly YoY Growth) data by YCharts

This may invert sometime in the near future, but for now, Nvidia has the growth edge.

One item that could reaccelerate Nvidia’s growth is the return of its business in China. Nvidia is currently working on obtaining its export license for H20 chips. Once that is returned, the company could see a massive demand from another country that requires significant AI computing power. Even without a massive chunk of sales, Nvidia is still growing faster than Palantir, giving it the advantage here.

Nvidia is far cheaper than Palantir

With both companies growing at a similar rate, it would be logical to expect that they should trade within a similar valuation range. However, that’s not the case. Whether you analyze the stocks from a forward price-to-earnings (P/E) or price-to-sales (P/S) basis, Palantir’s stock is unbelievably expensive.

NVDA PE Ratio (Forward) Chart

NVDA PE Ratio (Forward) data by YCharts

From a P/S basis, Palantir is about 5 times more expensive than Nvidia. From a forward P/E basis, it’s about 6.5 times more expensive.

With these two growing at the same rate, this massive premium for Palantir’s stock doesn’t make a ton of sense. It will take years, or even a decade, at Palantir’s growth rate to bring its valuation down to a reasonable level; yet, Nvidia is already trading at that price point.

I think this gives Nvidia an unassailable advantage for investors, and I think it’s the far better buy right now, primarily due to valuation, as Palantir’s price has gotten out of control.

Keithen Drury has positions in Nvidia. The Motley Fool has positions in and recommends Nvidia and Palantir Technologies. The Motley Fool has a disclosure policy.



Source link

Continue Reading

Trending