Skip to content

Cache Latency

Overview

The cache_lat test evaluates the interconnect and cache hierarchy by forcing worst-case scenario memory reads.

Execution Mechanics

Modern GPUs rely heavily on hardware prefetchers that predict what memory will be accessed next. This kernel defeats them entirely through pointer-chasing.

  • It uses Linear Congruential Generator (LCG) constants to build a randomized memory traversal map.
  • Because every read explicitly depends on the address retrieved by the previous read, the hardware cannot pre-fetch data, stalling the pipeline and exposing the true physical latency of the cache interconnects.

Target Subsystems

  • Primary Target: L1/L2 Cache Latency and Interconnects.