Analyzing and understanding the memory access behaviors of applications are essential when optimizing computing systems and applications. One prominent example is the cache miss curve estimation, i.e., detecting the cache miss ratio as a function of the capacity using a memory access sequence. Historically, cache miss curves have been derived from their corresponding stack distance distributions that are obtained by keeping track of the depth on the LRU stack. As this procedure requires significant computational complexity, a variety of approximation techniques have been proposed ever since it was originally proposed by Mattson et al. in 70s.
We, however, claim that stack distances are not necessarily required to derive a cache miss curve.
This paper proposes an alternative, efficient approach: instead of relying on the stack distances, we approximate a cache miss curve from the cardinality of accesses — total access count as a function of unique access count. By doing so, we can apply well-established efficient probabilistic data structures (e.g., Log-Log counting and its variants) widely used in various domains. In our approach, we model LRU stack behavior as a Bernoulli trial and derive a macroscopical relationship between a cache miss curve and its corresponding cardinality curve. Driven by this relationship, we offer our miss curve estimation mechanism using cardinality detection data structures and a curve fitting approach.
We further offer several advanced techniques by taking advantage of the cardinality domain: (1) a compensation mechanism for sampled memory traces and (2) an algorithm to synthesize a miss curve of multiple access sequences by following the additivity principle in the cardinality domain. We comprehensively validate our approach using SPEC CPU 2017 benchmark suite under a variety of scenarios.
Contact: Eishi Arima
Link: