AMD RDNA4 Architecture: Complete Overview and Analysis
Out of Order Memory?
The memory subsystem in RDNA4 shows some interesting choices from AMD. Both the RX 9070 and RX 9070 XT come equipped with 16GB of VRAM, which is quite generous for cards in this performance tier and matches what NVIDIA is offering with the RTX 5070 Ti. This should provide ample headroom for current and upcoming titles, even at 4K resolution with high-resolution textures. According to AMD, the new architecture implements “Dual Ray Intersection” capabilities, which essentially allows the GPU to process two ray intersections simultaneously. Combined with the oriented bounding box optimizations mentioned earlier, this results in far fewer ray traversal steps for the same scene
But what’s really fascinating is the new “out-of-order memory” architecture that AMD has implemented with RDNA4. In previous generations, memory requests had to be processed in the order they were made, which could create bottlenecks when some requests took much longer than others. RDNA4 introduces additional out-of-order queues for memory requests, which means data that’s ready can be returned immediately without waiting for older, high-latency requests to complete first. This is especially crucial for ray tracing workloads, which the slides specifically call out as being highly sensitive to memory latency. When you’re tracing rays through a BVH structure and accessing textures and buffers for shading, you can have wildly different memory access patterns.
With RDNA4, shaders can now execute efficiently regardless of some long latency requests, as the architecture allows requests from different shaders to be satisfied out-of-order. In practical terms, this means operations like surface shading won’t be held up by something like an uncached leaf node access, resulting in better overall performance across many workloads. It’s a smart approach to addressing one of the key bottlenecks in modern GPU performance.