Interested in an internship or a thesis?
Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.
Implementation of a Lossless Data Compression Algorithm for Chiplet Interconnects
Description
In the BCDC project, a working group at TUM collaborates on designing a RISC-V-based chiplet demonstration chip, with two of them connected via an interposer to represent a system of interconnected chiplets. At LIS, we work on an efficient, low-latency chiplet interconnect with additional application-specific features managed by a Smart Chiplet Interconnect layer stack. It closes the gap between the underlying physical layer that handles data transmission across the interposer and the system bus that attaches the inter-chiplet interface to the other components of the demonstration chip. The design is based on the PULP platform's Serial Link.
As one of the key features of the Smart Chiplet Interconnect, we are developing an on-the-fly lossless data compression module to reduce the amount of data transmitted across the interposer and thus increase the effective bandwidth via a low-pin interface. A Python version of the LZ4-based algorithm is available and extends the baseline by features such as an inserted encoding stage and preloaded or fixed dictionary entries.
In this project, the student will be responsible for implementing the module in SystemVerilog. This includes realizing hardware-specific optimizations for performance and resource usage. Alongside the compression module, the student will also implement the simpler corresponding decompression module for optional decompression on the receiving chiplet.
After verifying matching functionality with the Python reference, the student will evaluate the performance of the implemented modules. For this, the modules should be integrated into a minimal version of the chiplet interconnect stack and fed with realistic data patterns as they would arrive from the system bus or the interconnect. The evaluation will focus on the achievable compression ratio and latency of the hardware implementation. To estimate resource usage and the maximum achievable clock frequency, the student will synthesize the design for the VCU118 FPGA evaluation board.
The project will be accompanied by another Bachelor's thesis surrounding the Smart Chiplet Interconnect. Depending on the progress of the two projects, a combination and joint evaluation of the two designs may be possible and is encouraged.
Prerequisites
- Experience with hardware design in (System)Verilog
- Ideally, familiarity with data compression algorithms
- Structured way of working and strong problem-solving skills
- Interest in novel system architectures
Contact
michael.meidinger@tum.de
Supervisor:
Evaluation of a Page-Based Memory Preload Architecture Using Standardized Embedded Benchmarks
Description
Modern MPSoC architectures are increasingly limited by off-chip memory latency. To mitigate this bottleneck, a page-based hardware preload unit has been developed that speculatively transfers DRAM pages upon last-level cache misses in order to hide memory access latency.
The goal of this bachelor thesis is to perform a systematic and scientifically sound evaluation of this architecture using internationally recognized embedded benchmark suites. The work will focus on identifying, porting, and executing suitable bare-metal benchmarks on an FPGA-based RISC-V platform (CVA6 architecture). Candidate benchmark suites include Embench, CoreMark, PolyBench/C, MiBench, and other memory-intensive workloads. The final selection will be made during the course of the thesis based on feasibility and relevance.
The thesis involves implementing the benchmarks in the existing hardware/software framework, conducting structured performance measurements, and comparing different system configurations (e.g., with and without the preload unit). Particular emphasis will be placed on analyzing memory behavior, working-set characteristics, and access patterns.
Beyond implementation, the thesis will provide a scientific evaluation of how different workload classes interact with page-based preloading. Results will be analyzed quantitatively and presented in a clear and reproducible manner using normalized speedups and workload classifications.
The outcome of this work will provide a solid experimental foundation for further research and potential publications in the area of memory-optimized MPSoC architectures.
Prerequisites
- Good Knowledge about MPSoCs
- Good C programming skills
- Basic understanding of hardware-oriented programming style
- High motivation
- Self-responsible workstyle
Contact
Oliver Lenke
o.lenke@tum.de
Supervisor:
Student
Balancing Preload Efficiency and Responsiveness through Adaptive Burst Lengths
Description
Page-based memory preloading typically relies on fixed burst lengths to transfer data efficiently from DRAM. While long bursts maximize preload throughput, they reduce responsiveness to demand-driven CPU memory accesses. Short bursts improve reactivity but underutilize available memory bandwidth.
This thesis builds on the existing page-based preload unit and investigates a hardware-based mechanism for dynamically adjusting preload burst length according to current memory system utilization. The goal is to balance preload efficiency and fast reaction to demand accesses at runtime. The proposed mechanism adapts burst length based on simple runtime indicators such as DRAM activity or the presence of competing CPU requests. The implementation extends the existing preload FSM and does not require any modifications to the CPU microarchitecture
Evaluation on an FPGA-based platform analyzes execution time, interference with demand accesses, and bandwidth utilization under different memory-intensive workloads. The results aim to demonstrate that adaptive burst sizing is an effective and low-overhead technique to improve the robustness of memory-side preloading.
Prerequisites
- Good Knowledge about MPSoCs
- Good C programming skills
- High motivation
- Self-responsible workstyle
Contact
Oliver Lenke
o.lenke@tum.de