Title Practical Course: Experimental Evaluation of modern Computing Systems and Accelerators (IN0012, IN2106, IN2397, IN4294)
Dieses Praktikum wird nur in englischer Sprache angeboten.
Experimental Evaluation of modern Computing Systems and Accelerators
This lab course will be held in English.
Bachelor/Master practical course (IN0012, IN4294, IN2106, IN2397).

The Course
This lab is organized in collaboration with LRZ and LMU.
It is provided and carried out as a single course both for bachelor and master students.
What you will learn from this course ?
The LRZ has a testbed environment with novel architectures and accelerators called BEAST (Bavarian Energy, Architecture and Software Testbed). This will help in shaping future large LRZ systems (such as successors of SuperMUC-NG). In this lab you will gain hands on experience on the BEAST testbed.
You will:
- have the chance to work on newest HPC technology such as CPUs or GPUs from Intel (Sapphire Rapids Xeon, Intel GPU Max), AMD (Milan, MI-210), NVidia (Grace Hopper), Fujitsu (A64FX)
- explore various features of novel technology on the node level,
- learn about node level performance optimizations,
- attend to vendor talks on their newest hardware,
- learn various programming models on the node level for multicore CPUs and accelerators, such as OpenMP (also for GPU), SIMD, synchronization, Kokkos (vendor-agnostic GPU programming model on top of CUDA / HIP / SYCL). For later assignments - make use of CUDA / HIP / SYCL (bonus).
Structure
Students are working in groups (group size of 2 if master students, 3 if bachelor). The lab comes with weekly assignments, and it consists of two parts. In the first part, you will evaluate the available systems with provided micro-benchmarks you may need to modify, and do various measurement experiments. In reports, you discuss outcomes, explain various effects seen in the measurement, as well as compare across systems. We look both at CPUs and GPUs. Next to introduction talks, the weekly meetings mainly are used for student presentations and discussion of seen effects. Topics covered by assignments:
- available profiling tools
- bandwidths and latencies of memory accesses, effects of caches and optimizations
- compute unit performance, SIMD, instruction/thread level parallelism, branch prediction, intrinsics
The second part consists of two real-world HPC codes you are expected to tune, using your knowledge from the first part. Each group will be assigned one of the systems for each of the latter two assignents.
Grading
Students will work in groups, doing reports for each task as well as short presentations of their evaluation results (2 groups will be selected every week to present their reports). Grading is done based on delivered reports and short presentations. This includes source code used for the evaluations, which has to be added to the reports.
Registration
Registration for the lab will happen via the matching system.
Expected Skills
- Good knowledge of C or C++ on Linux
- Understanding of terms in computer architecture: pipelining, SIMD, multi-core, SMT, TLB, processor caches, replacement policies, GPU architecture (before registering in the matching system, ask yourself if you can explain these terms!)
- Interest in computer architecture, benchmarking, and low-level code optimization
- Some basic previous knowledge about GPU programming and architecture is not a must but useful.
- Successful completion of Parallel Programming (IN2147) course is not a must but useful.
Schedule / Dates
Weekly meetings during the semester: in presence, Thursday 4pm - 6pm, location: FMI 01.06.020 (sometimes at LMU)
First date: October 16, 2025