Accepted paper at ProTools'25 (SC Conference Workshop) titled MT4G: A Tool for Reliable Auto-Discovery of NVIDIA and AMD GPU Compute and Memory Topologies by : Stepan Vanecek, Manuel Walter Mußbacher, Dominik Größler, Urvij Saroliya, and Martin Schulz (all TUM).
The paper will be presented in the ProTools workshop at SC'25 (St. Louis, MO, USA) on 17th November 2025, 2:30--3:00 pm (central time) in room 241.
Link to the presentation (with possible video recording):
https://sc25.conference-program.com/presentation/?id=ws_prot103&sess=sess225
Understanding GPU topology is essential for performance-related tasks in HPC or AI. Yet, unlike for CPUs with tools like hwloc, GPU information is hard to come by, incomplete, and vendor-specific.
In this work, we address this gap and present MT4G, an open-source and vendor-agnostic tool that automatically discovers GPU compute and memory topologies and configurations, including cache sizes, bandwidths, and physical layouts. MT4G combines existing APIs with a suite of over 50 microbenchmarks, applying statistical methods, such as the Kolmogorov-Smirnov test, to automatically and reliably identify otherwise programmatically unavailable topological attributes.
We showcase MT4G’s universality on ten different GPUs and demonstrate its impact through integration into three workflows: GPU performance modeling, GPUscout bottleneck analysis, and dynamic resource partitioning. These scenarios highlight MT4G’s role in understanding system performance and characteristics across NVIDIA and AMD GPUs, providing an automated, portable solution for modern HPC and AI systems.
Link: