Bachelorarbeiten
Web-Based Digital Microfluidic (DMF) Design Platform
Beschreibung
Project Overview
Digital Microfluidics (DMF) is a cutting-edge technology that enables the precise manipulation of minute fluid volumes (droplets) via electrical actuation. We currently have a functional web-based design tool that allows researchers to create custom PCB-based and glass-based DMF chips. This platform streamlines the transition from concept to manufacturable hardware by providing features like custom electrode placement, automated routing, and experiment definition.
We are looking for motivated students to join our follow-up project. The goal is to extend the platform's functional modules and refine the core routing algorithms to handle increasingly complex chip architectures.
Tasks
As a student on this project, you will focus on two primary areas:
1. Platform Extension & Feature Enhancement
- Integrated Path Planning: Develop an automated droplet path planning feature where users can select start and end points, and the system generates the optimal movement sequence.
- Functional Module Libraries: Create templates and interfaces for specialized biological and chemical detection modules to improve design efficiency for specific experimental scenarios.
- Advanced UI/UX: Enhance the interactive editor, building upon existing features like "undo/redo," "copy/paste," and the "parallel electrode" batch processing system.
2. Routing Algorithm Refinement
- Algorithm Optimization: Work with our existing WebAssembly (WASM) and Web Worker-based routing engine to improve performance and success rates for high-density designs.
- Geometric Refinement: Modify the grid-based routing and collision detection logic to support finer electrode spacings and complex trace widths.
- Via Management: Refine the dynamic via cost mechanisms to optimize vertical interconnections between PCB layers.
Technical Environment
You will work with a modern, high-performance tech stack:
- Frontend: Vue 3, Element Plus, and SVG for vector graphics rendering.
- Core Logic: C++ (compiled to WebAssembly) for heavy computational tasks.
- Communication: Web Serial API for real-time hardware interfacing.
- Hardware Integration: Exporting KiCad-compatible files for physical PCB manufacturing.
Requirements
- Strong interest in Electronic Design Automation (EDA) or Microfluidics.
- Proficiency in JavaScript/TypeScript (preferably Vue 3) or C++.
- Basic understanding of geometric algorithms or PCB design is a plus.
Kontakt
If you are interested, please contact:
Be sure to include your current transcript and CV with your message.
Betreuer:
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
AI-Driven Optimization for Chip Design (Macro Placement)
Chip Design, Physical Design Automation, Optimization
We are looking for motivated students to work on algorithmic approaches for chip design optimization, with a focus on macro placement. The project combines machine learning and combinatorial optimization and can be connected to an ongoing industry challenge with a submission deadline in May 2026.
Beschreibung
Modern chip design faces increasingly complex optimization challenges, where millions of design decisions must be made under tight constraints. One key problem in this space is macro placement: arranging large components (e.g., SRAM blocks, IPs) on a chip such that routing congestion, timing, power delivery, and area are jointly optimized.
This problem is inherently difficult: it involves a highly discrete design space, multiple competing objectives, and strong global dependencies between decisions. Classical methods have been refined for decades, yet recent advances in machine learning—particularly reinforcement learning and graph-based methods—suggest new opportunities for improvement.
In this research internship, you will work on developing and evaluating novel approaches for macro placement and related optimization problems. The focus is on designing efficient algorithms that can handle large-scale, highly constrained systems and produce high-quality solutions under realistic runtime constraints.
Possible directions include:
- learning-based approaches (e.g., reinforcement learning, GNNs),
- hybrid optimization methods combining heuristics and ML,
- scalable search and approximation techniques,
- or improving classical placement strategies with modern tooling.
As part of the project, there is the opportunity to evaluate your approach on a current industry-backed challenge:
https://partcl.com/blog/macro_placement_challenge
(Submissions deadline: May 21, 2026)
While participation in the challenge is optional, it provides a concrete benchmark and external validation for your work. Projects may be conducted in small supervised groups, but individual contributions and reports are required.
The topic is well suited for continuation into a larger thesis project in the area of machine learning for systems and chip design.
Voraussetzungen
- Background in Computer Science, Electrical Engineering, or related field
- Strong programming skills
- Interest in optimization, algorithms, or machine learning
- Familiarity with ML methods (e.g., RL, deep learning, or GNNs) is a plus
- Strong problem-solving skills and willingness to work on complex systems
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Transferable Power Estimation Based on the NetTAG Framework
Beschreibung
Power dissipation of integrated circuits (IC) is crucial, as it directly influences the battery life of edge devices, but also the cooling requirements for servers. To get to power-aware IC designs, precise power modeling is crucial in the design flow. Usually, this is done by a mapping of input features, like input signal activities or the number of gates in the design, to dynamic, static, or total power. Here, recently, machine learning (ML)-based models are in focus.
The drawback of ML-based models is their limited transferability from circuit designs used in training to unseen circuits. Foundation models, like large language models, have shown great potential in other domains through their generalizability. Hence, they could also support here in the transferability problem of power modeling. Foundation models specifically designed for ICs, like the NetTAG [1], have been proposed. But, it is still open if these complex frameworks provide a significant benefit to power modeling.
The goal of this project is:
- Getting familiar with the NetTAG framework and its adapted version at the chair
- Design a downstream task for power estimation
- Evaluation of the transferability of NetTAG+downstream task
[1] Fang, Wenji, et al. "Nettag: A multimodal rtl-and-layout-aligned netlist foundation model via text-attributed graph." 2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025.
Voraussetzungen
- Very profound knowledge of Python
- Excellent debugging skills
- Good knowledge of the digital IC design flow
- Good knowledge of HDL designs at RTL and netlist level (preferably, Verilog)
- Basic knowledge of foundation models
- Highly motivated, independent, and organized working style
Kontakt
If you are interested, please send your application to philipp.fengler@tum.de
Betreuer:
Integrity Verification Schemes for Distributed AI Inference on Chiplets
Integrity, Safety, Security, Fault Tolerance
This project focuses on integrity verification mechanisms for distributed AI inference on chiplet-based architectures. The goal is to analyze and evaluate lightweight techniques for detecting faults or corrupted intermediate results during the execution of distributed neural networks, enabling reliable and efficient AI workloads across multiple compute units.
Beschreibung
Emerging computing architectures increasingly rely on chiplet-based systems and distributed execution to efficiently run complex workloads such as AI inference. In such systems, computations and intermediate results are exchanged between multiple processing units. Ensuring the integrity and correctness of these computations becomes an important challenge, particularly in the presence of hardware faults, communication errors, or malicious manipulation.
Techniques for detecting computational errors have long been studied, for example, through Algorithm-based Fault Tolerance (ABFT) methods for linear algebra operations [1]. More recently, similar concepts have been explored for machine learning workloads and NN inference, where protecting intermediate results and detecting corrupted computations is becoming increasingly important [2], [3], [4], [5]. At the same time, emerging architectures such as chiplet-based systems introduce new challenges for ensuring reliable execution across distributed compute units.
This student project investigates mechanisms for verifying the correctness of distributed AI computations in heterogeneous and chiplet-based architectures. Possible directions include:
-
Techniques for integrity verification of distributed AI inference
-
Detection of faults or corrupted intermediate results
-
Lightweight verification mechanisms based on algorithmic or system-level approaches
-
Analysis of trade-offs between reliability, performance, and overhead
References
[1] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[2] S. K. S. Hari, M. B. Sullivan, T. Tsai and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[3] J. Hoefer, M. Stammler, F. Kreß, T. Hotfilter, T. Harbaum and J. Becker, "BayWatch: Leveraging Bayesian Neural Networks for Hardware Fault Tolerance and Monitoring," 2024 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Didcot, United Kingdom, 2024, pp. 1-6, doi: 10.1109/DFT63277.2024.10753546.
[4] Z. Chen, G. Li and K. Pattabiraman, "A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction," in IEEE Design & Test, doi: 10.1109/MDAT.2025.3618758.
[5] J. Kappes, J. Geier, P. van Kempen, D. Mueller-Gritschneder and U. Schlichtmann, "Automated Graph-level Passes for TinyML Fault Tolerance," 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 2025, pp. 1-9, doi: 10.1109/IJCNN64981.2025.11227379.
Voraussetzungen
Required:
- Interest in computer architecture, machine learning systems, or reliable computing
- Programming experience (e.g., C/C++ and Python, or similar)
- Experience with ML compilers such as IREE
- Motivation to work on research-oriented topics
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with embedded software development
- Knowledge of machine learning methods
Kontakt
Apply with CV and Transcript of Records directly to: m.schirmer@tum.de
Betreuer:
Masterarbeiten
Web-Based Digital Microfluidic (DMF) Design Platform
Beschreibung
Project Overview
Digital Microfluidics (DMF) is a cutting-edge technology that enables the precise manipulation of minute fluid volumes (droplets) via electrical actuation. We currently have a functional web-based design tool that allows researchers to create custom PCB-based and glass-based DMF chips. This platform streamlines the transition from concept to manufacturable hardware by providing features like custom electrode placement, automated routing, and experiment definition.
We are looking for motivated students to join our follow-up project. The goal is to extend the platform's functional modules and refine the core routing algorithms to handle increasingly complex chip architectures.
Tasks
As a student on this project, you will focus on two primary areas:
1. Platform Extension & Feature Enhancement
- Integrated Path Planning: Develop an automated droplet path planning feature where users can select start and end points, and the system generates the optimal movement sequence.
- Functional Module Libraries: Create templates and interfaces for specialized biological and chemical detection modules to improve design efficiency for specific experimental scenarios.
- Advanced UI/UX: Enhance the interactive editor, building upon existing features like "undo/redo," "copy/paste," and the "parallel electrode" batch processing system.
2. Routing Algorithm Refinement
- Algorithm Optimization: Work with our existing WebAssembly (WASM) and Web Worker-based routing engine to improve performance and success rates for high-density designs.
- Geometric Refinement: Modify the grid-based routing and collision detection logic to support finer electrode spacings and complex trace widths.
- Via Management: Refine the dynamic via cost mechanisms to optimize vertical interconnections between PCB layers.
Technical Environment
You will work with a modern, high-performance tech stack:
- Frontend: Vue 3, Element Plus, and SVG for vector graphics rendering.
- Core Logic: C++ (compiled to WebAssembly) for heavy computational tasks.
- Communication: Web Serial API for real-time hardware interfacing.
- Hardware Integration: Exporting KiCad-compatible files for physical PCB manufacturing.
Requirements
- Strong interest in Electronic Design Automation (EDA) or Microfluidics.
- Proficiency in JavaScript/TypeScript (preferably Vue 3) or C++.
- Basic understanding of geometric algorithms or PCB design is a plus.
Kontakt
If you are interested, please contact:
Be sure to include your current transcript and CV with your message.
Betreuer:
Multi-Objective Optimization of Performance, Energy, and Lifetime for Hybrid Optical-Electrical NoCs
Beschreibung
As data communication demands in many-core systems grow dramatically,networks-on-chip (NoCs) have emerged as an efficient framework for on-chip communication. Electrical Networks-on-Chip (ENoCs) and Wavelength-Routed Optical Networks-on-Chip (WRONoCs) are both considered promising solutions. WRONoCs provide high bandwidth and low latency, while the data transmission of ENoCs is more energy-efficient. To combine their advantages, hybrid Electrical-Optical Networks-on-Chip have been proposed by integrating both transmission paradigms, allowing data to be transmitted through either electrical or optical paths. However, whether such combined architectures can fully realize their potential largely depends on how task mapping is performed. This thesis explores task mapping for the joint optimization of performance, energy, and lifetime in a hybrid optical-electrical NoC architecture.
Voraussetzungen
Applicants are expected to have:
- A background in computer architecture, computer engineering, electrical engineering, or related fields
- Basic knowledge of Networks-on-Chip (NoC) and interest in Optical NoC (ONoC)
- Strong programming experience (e.g., Python, C/C++, or MATLAB)
Experience in task mapping optimization or with optical routers is a plus.
Kontakt
If you are interested in this thesis topic, please send your CV and academic transcript to:
jiahui.peng@tum.de
Betreuer:
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
AI-Driven Optimization for Chip Design (Macro Placement)
Chip Design, Physical Design Automation, Optimization
We are looking for motivated students to work on algorithmic approaches for chip design optimization, with a focus on macro placement. The project combines machine learning and combinatorial optimization and can be connected to an ongoing industry challenge with a submission deadline in May 2026.
Beschreibung
Modern chip design faces increasingly complex optimization challenges, where millions of design decisions must be made under tight constraints. One key problem in this space is macro placement: arranging large components (e.g., SRAM blocks, IPs) on a chip such that routing congestion, timing, power delivery, and area are jointly optimized.
This problem is inherently difficult: it involves a highly discrete design space, multiple competing objectives, and strong global dependencies between decisions. Classical methods have been refined for decades, yet recent advances in machine learning—particularly reinforcement learning and graph-based methods—suggest new opportunities for improvement.
In this research internship, you will work on developing and evaluating novel approaches for macro placement and related optimization problems. The focus is on designing efficient algorithms that can handle large-scale, highly constrained systems and produce high-quality solutions under realistic runtime constraints.
Possible directions include:
- learning-based approaches (e.g., reinforcement learning, GNNs),
- hybrid optimization methods combining heuristics and ML,
- scalable search and approximation techniques,
- or improving classical placement strategies with modern tooling.
As part of the project, there is the opportunity to evaluate your approach on a current industry-backed challenge:
https://partcl.com/blog/macro_placement_challenge
(Submissions deadline: May 21, 2026)
While participation in the challenge is optional, it provides a concrete benchmark and external validation for your work. Projects may be conducted in small supervised groups, but individual contributions and reports are required.
The topic is well suited for continuation into a larger thesis project in the area of machine learning for systems and chip design.
Voraussetzungen
- Background in Computer Science, Electrical Engineering, or related field
- Strong programming skills
- Interest in optimization, algorithms, or machine learning
- Familiarity with ML methods (e.g., RL, deep learning, or GNNs) is a plus
- Strong problem-solving skills and willingness to work on complex systems
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Integrity Verification Schemes for Distributed AI Inference on Chiplets
Integrity, Safety, Security, Fault Tolerance
This project focuses on integrity verification mechanisms for distributed AI inference on chiplet-based architectures. The goal is to analyze and evaluate lightweight techniques for detecting faults or corrupted intermediate results during the execution of distributed neural networks, enabling reliable and efficient AI workloads across multiple compute units.
Beschreibung
Emerging computing architectures increasingly rely on chiplet-based systems and distributed execution to efficiently run complex workloads such as AI inference. In such systems, computations and intermediate results are exchanged between multiple processing units. Ensuring the integrity and correctness of these computations becomes an important challenge, particularly in the presence of hardware faults, communication errors, or malicious manipulation.
Techniques for detecting computational errors have long been studied, for example, through Algorithm-based Fault Tolerance (ABFT) methods for linear algebra operations [1]. More recently, similar concepts have been explored for machine learning workloads and NN inference, where protecting intermediate results and detecting corrupted computations is becoming increasingly important [2], [3], [4], [5]. At the same time, emerging architectures such as chiplet-based systems introduce new challenges for ensuring reliable execution across distributed compute units.
This student project investigates mechanisms for verifying the correctness of distributed AI computations in heterogeneous and chiplet-based architectures. Possible directions include:
-
Techniques for integrity verification of distributed AI inference
-
Detection of faults or corrupted intermediate results
-
Lightweight verification mechanisms based on algorithmic or system-level approaches
-
Analysis of trade-offs between reliability, performance, and overhead
References
[1] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[2] S. K. S. Hari, M. B. Sullivan, T. Tsai and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[3] J. Hoefer, M. Stammler, F. Kreß, T. Hotfilter, T. Harbaum and J. Becker, "BayWatch: Leveraging Bayesian Neural Networks for Hardware Fault Tolerance and Monitoring," 2024 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Didcot, United Kingdom, 2024, pp. 1-6, doi: 10.1109/DFT63277.2024.10753546.
[4] Z. Chen, G. Li and K. Pattabiraman, "A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction," in IEEE Design & Test, doi: 10.1109/MDAT.2025.3618758.
[5] J. Kappes, J. Geier, P. van Kempen, D. Mueller-Gritschneder and U. Schlichtmann, "Automated Graph-level Passes for TinyML Fault Tolerance," 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 2025, pp. 1-9, doi: 10.1109/IJCNN64981.2025.11227379.
Voraussetzungen
Required:
- Interest in computer architecture, machine learning systems, or reliable computing
- Programming experience (e.g., C/C++ and Python, or similar)
- Experience with ML compilers such as IREE
- Motivation to work on research-oriented topics
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with embedded software development
- Knowledge of machine learning methods
Kontakt
Apply with CV and Transcript of Records directly to: m.schirmer@tum.de
Betreuer:
ISS-based Modeling and Evaluation of the RISC-V Integrated Matrix Extension (IME)
Virtual Prototyping, RISC-V, Machine Learning, Matrix Extensions, Simulation, ISS, ISA Modeling, ETISS, Python, C++, C
This project focuses on the instruction-set simulator (ISS)–based modeling and evaluation of the RISC-V Integrated Matrix Extension (IME), which is currently under active development within the RISC-V community. The goal is to extend the ETISS simulator developed at the EDA Chair to support IME, enabling early software development, functional validation, and performance exploration of matrix-oriented workloads.
Beschreibung
Matrix extensions are a key building block for accelerating modern workloads such as machine learning, signal processing, and data analytics. There are tree styles of matrix extensions in development by the RISC-V community:
• Vector-Matrix-Extension (VME)
• Integrated-Matrix-Extension (IME)
• Attached-Matrix-Extension (AME)
In this project, the student will model the proposed IME instructions at the ISA level and integrate them into the ETISS instruction set simulator using the CoreDSL2 ecosystem. This includes defining instruction encodings, assembler syntax, and functional behavior, as well as developing a host-side simulation library to emulate matrix operations.
The extended simulator will be used to evaluate the IME design through testing and benchmarking, allowing analysis of correctness, usability, and performance characteristics. The project provides hands-on experience with ISA design, virtual prototyping, and simulation-based evaluation of emerging hardware extensions, closely aligned with current industrial and academic research.
Tasks
- Study the proposed RISC-V IME specification and analyze differences compared to VME and AME approaches
- ISA-level modeling of IME instructions using CoreDSL2 (instruction encoding, assembler syntax, and semantic behavior)
- Development of a host-side simulation library (based on existing softvector libraries) to emulate matrix operations
- Integration of IME support into the ETISS simulator using the CoreDSL retargeting ecosystem
- Evaluation of the IME extension through testing, benchmarking, and example workloads
Reading Material
- IME Proposal (SpaceMiT): https://github.com/spacemit-com/riscv-ime-extension-spec
-
Matrix Extension Proposal (T-Head): https://github.com/XUANTIE-RV/riscv-matrix-extension-spec?tab=readme-ov-file
-
IME Mailing List: https://lists.riscv.org/g/tech-integrated-matrix-extension
- IME Charter: https://riscv.atlassian.net/wiki/spaces/IMEX/pages/46071925/Charter
- VME Mailing List:
-
VME Charter: https://riscv.atlassian.net/wiki/spaces/VMEX/pages/663912452/Vector-Matrix+Extension+VME+Charter
-
AME Mailing List: https://lists.riscv.org/g/tech-attached-matrix-extension
- AME Charter: https://riscv.atlassian.net/wiki/spaces/AMEX/pages/55083388/Charter
Voraussetzungen
Required:
- Proficiency in C/C++ and Python
- Basic knowledge of instruction set architectures (ISAs) (e.g., RISC-V)?• Experience with embedded software development
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with compilers or code generation
- Knowledge of machine learning workloads and hardware acceleration concepts?(e.g., GEMM, data layouts)
Kontakt
Apply with CV and Transcript of Records directly to: philipp.van-kempen@tum.de
Betreuer:
Physical Implementation of Post-Quantum Cryptography Modules Using Open-Source EDA Tools
Beschreibung
Post-quantum cryptography (PQC) is a key enabler for future quantum-safe systems, but the physical realization of PQC hardware modules remains challenging due to their high computational complexity and strict performance and energy constraints. While many PQC algorithms have been studied at the algorithmic and architectural levels, their physical design aspects using open-source EDA tools are still underexplored.
This thesis focuses on the physical implementation and evaluation of PQC hardware modules using an open-source EDA flow. The student will select representative PQC modules (e.g., key computational kernels or full accelerators) and implement them from RTL to layout using open-source tools for synthesis, placement, and routing. Different architectural and implementation configurations will be explored to study their impact on area, timing, and power. The outcome of this work will provide practical insights into the physical design trade-offs of PQC hardware and contribute to open and reproducible chip design methodologies.
Outstanding candidates may be considered for a short-term on-site research stay at TUMCREATE (Singapore), subject to mutual interest and project needs.
Voraussetzungen
- Solid background in digital circuit design and computer architecture
- Basic understanding of VLSI design flow (synthesis, place and route)
- Experience with hardware description languages (Verilog or SystemVerilog)
- Familiarity with Linux-based development environments
- Experience with scripting languages (e.g., Python or Tcl) is a plus
- Interest in hardware security or cryptographic hardware is desirable but not mandatory
Kontakt
If you are interested in this thesis topic, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Customized Optical Router Design and Task Mapping for Large-Scale Optical Networks-on-Chip
Beschreibung
As Optical Networks-on-Chip (ONoCs) scale to support an increasing number of cores and diverse communication patterns, a single, uniform router design is often insufficient to achieve optimal performance and energy efficiency. Different communication requirements—such as global data exchange and local traffic—may benefit from different types of optical routers and interconnection structures.
This thesis explores the customized design and composition of optical routers for large-scale ONoC systems. Instead of selecting a single router architecture, the project investigates how different router types (e.g., ring-based routers for global communication and compact routers for local communication) can be combined and deployed to better match application communication patterns. In addition, the thesis will address the task-to-node mapping problem, jointly considering application-level communication behavior and the underlying ONoC structure.
The goal is to develop a co-design framework that integrates router customization, network architecture, and task mapping, enabling more efficient and scalable optical network designs.
Voraussetzungen
Applicants are expected to have:
-
A background in computer architecture, computer engineering, electrical engineering, or related fields
-
Basic knowledge of Networks-on-Chip (NoC) and interest in Optical NoC (ONoC)
-
Familiarity with or strong interest in task mapping / application mapping for parallel systems
-
Understanding of router architectures and network topologies
-
Programming experience (e.g., Python, C/C++, or MATLAB)
-
Ability and willingness to work primarily in a remote setting, with regular online communication
Prior experience with optical routers, heterogeneous NoC design, or system-level optimization is a plus, but not mandatory.
Kontakt
If you are interested in this thesis topic and comfortable with a remote working mode, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Accelerating Fault Simulation at RTL on GPU Compute Clusters
RTL, fault injection, GPU, Safety, Security
Beschreibung
One of the crucial tasks in designing, testing, and verifying a digital system is the early estimation of its fault tolerance. For this, fault injection simulations can be used to evaluate this tolerance at different phases of development. At the Register Transfer Level (RTL), an existing but not yet implemented hardware design can be simulated with higher accuracy than its simulation at the instruction or algorithm level. However, accuracy comes at a cost that grows with the number of simulations done within a fault injection analysis. For example, fault injection simulation of a CPU at RTL instead of Instruction Set Architecture (ISA) level would increase the simulation effort to include micro-architectural registers (pipeline, functional units, etc.),
To allow faster fault space exploration at RTL, one could do the following: (a) accelerate the simulation of an individual Device Under Test (DUT) and fault, or
(b) launch multiple fault simulations concurrently
In the case of (a), state-of-the-art research on fast RTL simulations has aimed to reduce simulation cost by multi-threaded simulation on CPUs [1][2][3] or GPUs [4][5][8].
In case (b), multiple independent simulations may be launched simultaneously on a distributed compute cluster. Parallelization of an individual simulation is not as substantial, since the computational platform is utilized anyway, i.e., task-level parallelism of individual simulations for long simulations: one CPU core per fault experiment.
Existing solutions for (b) mainly aim for CPU-based clusters [6][7], whereas for (a), the maximum speed of a single DUT simulation is required. In this work, we want to explore efficient fault simulation at RTL on GPU-based compute clusters that maximizes utilization with the number of individual experiments launched.
Tasks:
- Set up GPU-accelerated RTL simulation of [8][9]
- Implement a new fault injection arbitration framework for the RTL simulator
- Explore multi-experiment partitioning for fault simulation on the new platform
- Compare and benchmark against a CPU-based fault exploration scheme [6][7]
References:
- [1] W. Snyder, P. Wasson, D. Galbi, et al. Verilator. https://github.com/verilator/verilator, 2019. [Online].
- [2] S. Beamer and D. Donofrio, "Efficiently Exploiting Low Activity Factors to Accelerate RTL Simulation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.9218632.
- [3] Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, and Ru Huang. 2023. Khronos: Fusing Memory Access for Improved Hardware RTL Simulation. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23). Association for Computing Machinery, New York, NY, USA, 180–193. https://doi.org/10.1145/3613424.3614301
- [4] H. Qian and Y. Deng, "Accelerating RTL simulation with GPUs," 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2011, pp. 687-693, doi: 10.1109/ICCAD.2011.6105404.
- [5] Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany, and Tsung-Wei Huang. 2023. From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus. In Proceedings of the 51st International Conference on Parallel Processing (ICPP '22). Association for Computing Machinery, New York, NY, USA, Article 88, 1–12. https://doi.org/10.1145/3545008.3545091
- [6] Johannes Geier and Daniel Mueller-Gritschneder. 2023. VRTLmod: An LLVM based Open-source Tool to Enable Fault Injection in Verilator RTL Simulations. In Proceedings of the 20th ACM International Conference on Computing Frontiers (CF '23). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3587135.3591435
- [7] J. Geier, L. Kontopoulos, D. Mueller-Gritschneder and U. Schlichtmann, "Rapid Fault Injection Simulation by Hash-Based Differential Fault Effect Equivalence Checks," 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025, pp. 1-7, doi: 10.23919/DATE64628.2025.10993266.
- [8] Guo, Zizheng, et al. "GEM: GPU-Accelerated Emulator-Inspired RTL Simulation.https://guozz.cn/publication/gemdac-25/gemdac-25.pdf
- [9] Github NVLabs https://github.com/NVlabs/GEM
Voraussetzungen
- Excellent C++, Python
- Good understanding of GPU programming (CUDA) or interest to learn
- Decent knowledge of hardware design languages (Verilog, VHDL) and EDA tools (Vivado, Yosys)
- Decent knowledge of Statistics and probability
Kontakt
Apply with CV and Transcript of Records directly to:
johannes.geier@tum.de
Betreuer:
Interdisziplinäre Projekte
Web-Based Digital Microfluidic (DMF) Design Platform
Beschreibung
Project Overview
Digital Microfluidics (DMF) is a cutting-edge technology that enables the precise manipulation of minute fluid volumes (droplets) via electrical actuation. We currently have a functional web-based design tool that allows researchers to create custom PCB-based and glass-based DMF chips. This platform streamlines the transition from concept to manufacturable hardware by providing features like custom electrode placement, automated routing, and experiment definition.
We are looking for motivated students to join our follow-up project. The goal is to extend the platform's functional modules and refine the core routing algorithms to handle increasingly complex chip architectures.
Tasks
As a student on this project, you will focus on two primary areas:
1. Platform Extension & Feature Enhancement
- Integrated Path Planning: Develop an automated droplet path planning feature where users can select start and end points, and the system generates the optimal movement sequence.
- Functional Module Libraries: Create templates and interfaces for specialized biological and chemical detection modules to improve design efficiency for specific experimental scenarios.
- Advanced UI/UX: Enhance the interactive editor, building upon existing features like "undo/redo," "copy/paste," and the "parallel electrode" batch processing system.
2. Routing Algorithm Refinement
- Algorithm Optimization: Work with our existing WebAssembly (WASM) and Web Worker-based routing engine to improve performance and success rates for high-density designs.
- Geometric Refinement: Modify the grid-based routing and collision detection logic to support finer electrode spacings and complex trace widths.
- Via Management: Refine the dynamic via cost mechanisms to optimize vertical interconnections between PCB layers.
Technical Environment
You will work with a modern, high-performance tech stack:
- Frontend: Vue 3, Element Plus, and SVG for vector graphics rendering.
- Core Logic: C++ (compiled to WebAssembly) for heavy computational tasks.
- Communication: Web Serial API for real-time hardware interfacing.
- Hardware Integration: Exporting KiCad-compatible files for physical PCB manufacturing.
Requirements
- Strong interest in Electronic Design Automation (EDA) or Microfluidics.
- Proficiency in JavaScript/TypeScript (preferably Vue 3) or C++.
- Basic understanding of geometric algorithms or PCB design is a plus.
Kontakt
If you are interested, please contact:
Be sure to include your current transcript and CV with your message.
Betreuer:
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
AI-Driven Optimization for Chip Design (Macro Placement)
Chip Design, Physical Design Automation, Optimization
We are looking for motivated students to work on algorithmic approaches for chip design optimization, with a focus on macro placement. The project combines machine learning and combinatorial optimization and can be connected to an ongoing industry challenge with a submission deadline in May 2026.
Beschreibung
Modern chip design faces increasingly complex optimization challenges, where millions of design decisions must be made under tight constraints. One key problem in this space is macro placement: arranging large components (e.g., SRAM blocks, IPs) on a chip such that routing congestion, timing, power delivery, and area are jointly optimized.
This problem is inherently difficult: it involves a highly discrete design space, multiple competing objectives, and strong global dependencies between decisions. Classical methods have been refined for decades, yet recent advances in machine learning—particularly reinforcement learning and graph-based methods—suggest new opportunities for improvement.
In this research internship, you will work on developing and evaluating novel approaches for macro placement and related optimization problems. The focus is on designing efficient algorithms that can handle large-scale, highly constrained systems and produce high-quality solutions under realistic runtime constraints.
Possible directions include:
- learning-based approaches (e.g., reinforcement learning, GNNs),
- hybrid optimization methods combining heuristics and ML,
- scalable search and approximation techniques,
- or improving classical placement strategies with modern tooling.
As part of the project, there is the opportunity to evaluate your approach on a current industry-backed challenge:
https://partcl.com/blog/macro_placement_challenge
(Submissions deadline: May 21, 2026)
While participation in the challenge is optional, it provides a concrete benchmark and external validation for your work. Projects may be conducted in small supervised groups, but individual contributions and reports are required.
The topic is well suited for continuation into a larger thesis project in the area of machine learning for systems and chip design.
Voraussetzungen
- Background in Computer Science, Electrical Engineering, or related field
- Strong programming skills
- Interest in optimization, algorithms, or machine learning
- Familiarity with ML methods (e.g., RL, deep learning, or GNNs) is a plus
- Strong problem-solving skills and willingness to work on complex systems
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Accelerating Fault Simulation at RTL on GPU Compute Clusters
RTL, fault injection, GPU, Safety, Security
Beschreibung
One of the crucial tasks in designing, testing, and verifying a digital system is the early estimation of its fault tolerance. For this, fault injection simulations can be used to evaluate this tolerance at different phases of development. At the Register Transfer Level (RTL), an existing but not yet implemented hardware design can be simulated with higher accuracy than its simulation at the instruction or algorithm level. However, accuracy comes at a cost that grows with the number of simulations done within a fault injection analysis. For example, fault injection simulation of a CPU at RTL instead of Instruction Set Architecture (ISA) level would increase the simulation effort to include micro-architectural registers (pipeline, functional units, etc.),
To allow faster fault space exploration at RTL, one could do the following: (a) accelerate the simulation of an individual Device Under Test (DUT) and fault, or
(b) launch multiple fault simulations concurrently
In the case of (a), state-of-the-art research on fast RTL simulations has aimed to reduce simulation cost by multi-threaded simulation on CPUs [1][2][3] or GPUs [4][5][8].
In case (b), multiple independent simulations may be launched simultaneously on a distributed compute cluster. Parallelization of an individual simulation is not as substantial, since the computational platform is utilized anyway, i.e., task-level parallelism of individual simulations for long simulations: one CPU core per fault experiment.
Existing solutions for (b) mainly aim for CPU-based clusters [6][7], whereas for (a), the maximum speed of a single DUT simulation is required. In this work, we want to explore efficient fault simulation at RTL on GPU-based compute clusters that maximizes utilization with the number of individual experiments launched.
Tasks:
- Set up GPU-accelerated RTL simulation of [8][9]
- Implement a new fault injection arbitration framework for the RTL simulator
- Explore multi-experiment partitioning for fault simulation on the new platform
- Compare and benchmark against a CPU-based fault exploration scheme [6][7]
References:
- [1] W. Snyder, P. Wasson, D. Galbi, et al. Verilator. https://github.com/verilator/verilator, 2019. [Online].
- [2] S. Beamer and D. Donofrio, "Efficiently Exploiting Low Activity Factors to Accelerate RTL Simulation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.9218632.
- [3] Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, and Ru Huang. 2023. Khronos: Fusing Memory Access for Improved Hardware RTL Simulation. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23). Association for Computing Machinery, New York, NY, USA, 180–193. https://doi.org/10.1145/3613424.3614301
- [4] H. Qian and Y. Deng, "Accelerating RTL simulation with GPUs," 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2011, pp. 687-693, doi: 10.1109/ICCAD.2011.6105404.
- [5] Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany, and Tsung-Wei Huang. 2023. From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus. In Proceedings of the 51st International Conference on Parallel Processing (ICPP '22). Association for Computing Machinery, New York, NY, USA, Article 88, 1–12. https://doi.org/10.1145/3545008.3545091
- [6] Johannes Geier and Daniel Mueller-Gritschneder. 2023. VRTLmod: An LLVM based Open-source Tool to Enable Fault Injection in Verilator RTL Simulations. In Proceedings of the 20th ACM International Conference on Computing Frontiers (CF '23). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3587135.3591435
- [7] J. Geier, L. Kontopoulos, D. Mueller-Gritschneder and U. Schlichtmann, "Rapid Fault Injection Simulation by Hash-Based Differential Fault Effect Equivalence Checks," 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025, pp. 1-7, doi: 10.23919/DATE64628.2025.10993266.
- [8] Guo, Zizheng, et al. "GEM: GPU-Accelerated Emulator-Inspired RTL Simulation.https://guozz.cn/publication/gemdac-25/gemdac-25.pdf
- [9] Github NVLabs https://github.com/NVlabs/GEM
Voraussetzungen
- Excellent C++, Python
- Good understanding of GPU programming (CUDA) or interest to learn
- Decent knowledge of hardware design languages (Verilog, VHDL) and EDA tools (Vivado, Yosys)
- Decent knowledge of Statistics and probability
Kontakt
Apply with CV and Transcript of Records directly to:
johannes.geier@tum.de
Betreuer:
Forschungspraxis (Research Internships)
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
AI-Driven Optimization for Chip Design (Macro Placement)
Chip Design, Physical Design Automation, Optimization
We are looking for motivated students to work on algorithmic approaches for chip design optimization, with a focus on macro placement. The project combines machine learning and combinatorial optimization and can be connected to an ongoing industry challenge with a submission deadline in May 2026.
Beschreibung
Modern chip design faces increasingly complex optimization challenges, where millions of design decisions must be made under tight constraints. One key problem in this space is macro placement: arranging large components (e.g., SRAM blocks, IPs) on a chip such that routing congestion, timing, power delivery, and area are jointly optimized.
This problem is inherently difficult: it involves a highly discrete design space, multiple competing objectives, and strong global dependencies between decisions. Classical methods have been refined for decades, yet recent advances in machine learning—particularly reinforcement learning and graph-based methods—suggest new opportunities for improvement.
In this research internship, you will work on developing and evaluating novel approaches for macro placement and related optimization problems. The focus is on designing efficient algorithms that can handle large-scale, highly constrained systems and produce high-quality solutions under realistic runtime constraints.
Possible directions include:
- learning-based approaches (e.g., reinforcement learning, GNNs),
- hybrid optimization methods combining heuristics and ML,
- scalable search and approximation techniques,
- or improving classical placement strategies with modern tooling.
As part of the project, there is the opportunity to evaluate your approach on a current industry-backed challenge:
https://partcl.com/blog/macro_placement_challenge
(Submissions deadline: May 21, 2026)
While participation in the challenge is optional, it provides a concrete benchmark and external validation for your work. Projects may be conducted in small supervised groups, but individual contributions and reports are required.
The topic is well suited for continuation into a larger thesis project in the area of machine learning for systems and chip design.
Voraussetzungen
- Background in Computer Science, Electrical Engineering, or related field
- Strong programming skills
- Interest in optimization, algorithms, or machine learning
- Familiarity with ML methods (e.g., RL, deep learning, or GNNs) is a plus
- Strong problem-solving skills and willingness to work on complex systems
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Advancing Efficient LLM Inference with Compute-in-Memory
Large Language Models, Compute-in-Memory, AI Hardware, Efficient Inference
We are looking for a motivated student to join a research project on hardware architectures for efficient large language model inference. The internship offers the opportunity to work on a timely topic at the intersection of AI systems, computer architecture, and emerging memory-centric computing.
Beschreibung
Large language models continue to push modern hardware systems to their limits. In particular, data movement, memory access costs, and energy efficiency have become central bottlenecks for scalable inference. Compute-in-memory architectures offer a promising direction to address these challenges by bringing computation closer to where data resides and thereby reducing the overhead of conventional processor-centric systems.
In this research internship, you will contribute to an ongoing project on compute-in-memory architectures for LLM inference. The focus will be on deepening and sharpening the current research by analyzing recent developments, strengthening technical comparisons, refining system-level insights, and helping consolidate the latest progress in this rapidly evolving area.
This may include:
- studying recent work on efficient LLM inference and compute-in-memory systems,
- comparing architectural approaches and identifying key design trade-offs,
- helping structure and sharpen technical arguments and insights,
- supporting the analysis of hardware, performance, and energy-efficiency aspects,
- and contributing to the further development of a research manuscript.
The internship is particularly well suited for students who are interested in academia and would like to gain hands-on experience in how research is developed and turned into a strong scientific outcome. You will work on a highly relevant topic with clear practical and research impact, and gain insight into current questions in next-generation AI hardware.
Voraussetzungen
- Strong interest in academic research
- Basic understanding of AI/ML and computer architecture concepts
- Strong motivation to read, understand, and structure technical research
- Independent working style and willingness to learn quickly
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Transferable Power Estimation Based on the NetTAG Framework
Beschreibung
Power dissipation of integrated circuits (IC) is crucial, as it directly influences the battery life of edge devices, but also the cooling requirements for servers. To get to power-aware IC designs, precise power modeling is crucial in the design flow. Usually, this is done by a mapping of input features, like input signal activities or the number of gates in the design, to dynamic, static, or total power. Here, recently, machine learning (ML)-based models are in focus.
The drawback of ML-based models is their limited transferability from circuit designs used in training to unseen circuits. Foundation models, like large language models, have shown great potential in other domains through their generalizability. Hence, they could also support here in the transferability problem of power modeling. Foundation models specifically designed for ICs, like the NetTAG [1], have been proposed. But, it is still open if these complex frameworks provide a significant benefit to power modeling.
The goal of this project is:
- Getting familiar with the NetTAG framework and its adapted version at the chair
- Design a downstream task for power estimation
- Evaluation of the transferability of NetTAG+downstream task
[1] Fang, Wenji, et al. "Nettag: A multimodal rtl-and-layout-aligned netlist foundation model via text-attributed graph." 2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025.
Voraussetzungen
- Very profound knowledge of Python
- Excellent debugging skills
- Good knowledge of the digital IC design flow
- Good knowledge of HDL designs at RTL and netlist level (preferably, Verilog)
- Basic knowledge of foundation models
- Highly motivated, independent, and organized working style
Kontakt
If you are interested, please send your application to philipp.fengler@tum.de
Betreuer:
Integrity Verification Schemes for Distributed AI Inference on Chiplets
Integrity, Safety, Security, Fault Tolerance
This project focuses on integrity verification mechanisms for distributed AI inference on chiplet-based architectures. The goal is to analyze and evaluate lightweight techniques for detecting faults or corrupted intermediate results during the execution of distributed neural networks, enabling reliable and efficient AI workloads across multiple compute units.
Beschreibung
Emerging computing architectures increasingly rely on chiplet-based systems and distributed execution to efficiently run complex workloads such as AI inference. In such systems, computations and intermediate results are exchanged between multiple processing units. Ensuring the integrity and correctness of these computations becomes an important challenge, particularly in the presence of hardware faults, communication errors, or malicious manipulation.
Techniques for detecting computational errors have long been studied, for example, through Algorithm-based Fault Tolerance (ABFT) methods for linear algebra operations [1]. More recently, similar concepts have been explored for machine learning workloads and NN inference, where protecting intermediate results and detecting corrupted computations is becoming increasingly important [2], [3], [4], [5]. At the same time, emerging architectures such as chiplet-based systems introduce new challenges for ensuring reliable execution across distributed compute units.
This student project investigates mechanisms for verifying the correctness of distributed AI computations in heterogeneous and chiplet-based architectures. Possible directions include:
-
Techniques for integrity verification of distributed AI inference
-
Detection of faults or corrupted intermediate results
-
Lightweight verification mechanisms based on algorithmic or system-level approaches
-
Analysis of trade-offs between reliability, performance, and overhead
References
[1] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[2] S. K. S. Hari, M. B. Sullivan, T. Tsai and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[3] J. Hoefer, M. Stammler, F. Kreß, T. Hotfilter, T. Harbaum and J. Becker, "BayWatch: Leveraging Bayesian Neural Networks for Hardware Fault Tolerance and Monitoring," 2024 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Didcot, United Kingdom, 2024, pp. 1-6, doi: 10.1109/DFT63277.2024.10753546.
[4] Z. Chen, G. Li and K. Pattabiraman, "A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction," in IEEE Design & Test, doi: 10.1109/MDAT.2025.3618758.
[5] J. Kappes, J. Geier, P. van Kempen, D. Mueller-Gritschneder and U. Schlichtmann, "Automated Graph-level Passes for TinyML Fault Tolerance," 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 2025, pp. 1-9, doi: 10.1109/IJCNN64981.2025.11227379.
Voraussetzungen
Required:
- Interest in computer architecture, machine learning systems, or reliable computing
- Programming experience (e.g., C/C++ and Python, or similar)
- Experience with ML compilers such as IREE
- Motivation to work on research-oriented topics
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with embedded software development
- Knowledge of machine learning methods
Kontakt
Apply with CV and Transcript of Records directly to: m.schirmer@tum.de
Betreuer:
Logic Synthesis and Internal Mechanism Analysis of PQC Accelerators using Yosys
Beschreibung
As we enter the quantum era, traditional encryption is becoming vulnerable. Post-Quantum Cryptography (PQC) is the next-generation standard for securing global data. However, designing hardware for PQC requires specialized tools. Yosys is the world’s leading open-source logic synthesis framework, used to transform hardware descriptions (Verilog) into real circuit structures.
This is a hands-on, exploration-oriented project. You will:
- Master the Tool: Set up and explore the Yosys open-source synthesis framework in a Linux environment.
- Hardware Analysis: Run logic synthesis for PQC-related modules (e.g., Keccak/SHA-3 or Kyber operators) and visualize how they are mapped into gate-level circuits.
- Internal Discovery: Peek "under the hood" of Yosys to understand how it translates high-level code into optimized hardware netlists.
- Documentation: Create a "Technical Internal Guide" that explains Yosys’s key features and workflows to help the lab adopt this tool for future chip designs.
Voraussetzungen
- Core Knowledge: Basic understanding of Digital Logic (Gates, Flip-flops, etc.) and Verilog/SystemVerilog.
- Technical Skills: Comfortable working in a Linux command-line environment. Basic C++ or Python knowledge is a plus.
- Mindset: Curious, detail-oriented, and enjoys "breaking things down" to see how they work.
Kontakt
If you are interested in this thesis topic and comfortable with a remote working mode, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Physical Implementation of Post-Quantum Cryptography Modules Using Open-Source EDA Tools
Beschreibung
Post-quantum cryptography (PQC) is a key enabler for future quantum-safe systems, but the physical realization of PQC hardware modules remains challenging due to their high computational complexity and strict performance and energy constraints. While many PQC algorithms have been studied at the algorithmic and architectural levels, their physical design aspects using open-source EDA tools are still underexplored.
This thesis focuses on the physical implementation and evaluation of PQC hardware modules using an open-source EDA flow. The student will select representative PQC modules (e.g., key computational kernels or full accelerators) and implement them from RTL to layout using open-source tools for synthesis, placement, and routing. Different architectural and implementation configurations will be explored to study their impact on area, timing, and power. The outcome of this work will provide practical insights into the physical design trade-offs of PQC hardware and contribute to open and reproducible chip design methodologies.
Outstanding candidates may be considered for a short-term on-site research stay at TUMCREATE (Singapore), subject to mutual interest and project needs.
Voraussetzungen
- Solid background in digital circuit design and computer architecture
- Basic understanding of VLSI design flow (synthesis, place and route)
- Experience with hardware description languages (Verilog or SystemVerilog)
- Familiarity with Linux-based development environments
- Experience with scripting languages (e.g., Python or Tcl) is a plus
- Interest in hardware security or cryptographic hardware is desirable but not mandatory
Kontakt
If you are interested in this thesis topic, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Customized Optical Router Design and Task Mapping for Large-Scale Optical Networks-on-Chip
Beschreibung
As Optical Networks-on-Chip (ONoCs) scale to support an increasing number of cores and diverse communication patterns, a single, uniform router design is often insufficient to achieve optimal performance and energy efficiency. Different communication requirements—such as global data exchange and local traffic—may benefit from different types of optical routers and interconnection structures.
This thesis explores the customized design and composition of optical routers for large-scale ONoC systems. Instead of selecting a single router architecture, the project investigates how different router types (e.g., ring-based routers for global communication and compact routers for local communication) can be combined and deployed to better match application communication patterns. In addition, the thesis will address the task-to-node mapping problem, jointly considering application-level communication behavior and the underlying ONoC structure.
The goal is to develop a co-design framework that integrates router customization, network architecture, and task mapping, enabling more efficient and scalable optical network designs.
Voraussetzungen
Applicants are expected to have:
-
A background in computer architecture, computer engineering, electrical engineering, or related fields
-
Basic knowledge of Networks-on-Chip (NoC) and interest in Optical NoC (ONoC)
-
Familiarity with or strong interest in task mapping / application mapping for parallel systems
-
Understanding of router architectures and network topologies
-
Programming experience (e.g., Python, C/C++, or MATLAB)
-
Ability and willingness to work primarily in a remote setting, with regular online communication
Prior experience with optical routers, heterogeneous NoC design, or system-level optimization is a plus, but not mandatory.
Kontakt
If you are interested in this thesis topic and comfortable with a remote working mode, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Accelerating Fault Simulation at RTL on GPU Compute Clusters
RTL, fault injection, GPU, Safety, Security
Beschreibung
One of the crucial tasks in designing, testing, and verifying a digital system is the early estimation of its fault tolerance. For this, fault injection simulations can be used to evaluate this tolerance at different phases of development. At the Register Transfer Level (RTL), an existing but not yet implemented hardware design can be simulated with higher accuracy than its simulation at the instruction or algorithm level. However, accuracy comes at a cost that grows with the number of simulations done within a fault injection analysis. For example, fault injection simulation of a CPU at RTL instead of Instruction Set Architecture (ISA) level would increase the simulation effort to include micro-architectural registers (pipeline, functional units, etc.),
To allow faster fault space exploration at RTL, one could do the following: (a) accelerate the simulation of an individual Device Under Test (DUT) and fault, or
(b) launch multiple fault simulations concurrently
In the case of (a), state-of-the-art research on fast RTL simulations has aimed to reduce simulation cost by multi-threaded simulation on CPUs [1][2][3] or GPUs [4][5][8].
In case (b), multiple independent simulations may be launched simultaneously on a distributed compute cluster. Parallelization of an individual simulation is not as substantial, since the computational platform is utilized anyway, i.e., task-level parallelism of individual simulations for long simulations: one CPU core per fault experiment.
Existing solutions for (b) mainly aim for CPU-based clusters [6][7], whereas for (a), the maximum speed of a single DUT simulation is required. In this work, we want to explore efficient fault simulation at RTL on GPU-based compute clusters that maximizes utilization with the number of individual experiments launched.
Tasks:
- Set up GPU-accelerated RTL simulation of [8][9]
- Implement a new fault injection arbitration framework for the RTL simulator
- Explore multi-experiment partitioning for fault simulation on the new platform
- Compare and benchmark against a CPU-based fault exploration scheme [6][7]
References:
- [1] W. Snyder, P. Wasson, D. Galbi, et al. Verilator. https://github.com/verilator/verilator, 2019. [Online].
- [2] S. Beamer and D. Donofrio, "Efficiently Exploiting Low Activity Factors to Accelerate RTL Simulation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.9218632.
- [3] Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, and Ru Huang. 2023. Khronos: Fusing Memory Access for Improved Hardware RTL Simulation. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23). Association for Computing Machinery, New York, NY, USA, 180–193. https://doi.org/10.1145/3613424.3614301
- [4] H. Qian and Y. Deng, "Accelerating RTL simulation with GPUs," 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2011, pp. 687-693, doi: 10.1109/ICCAD.2011.6105404.
- [5] Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany, and Tsung-Wei Huang. 2023. From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus. In Proceedings of the 51st International Conference on Parallel Processing (ICPP '22). Association for Computing Machinery, New York, NY, USA, Article 88, 1–12. https://doi.org/10.1145/3545008.3545091
- [6] Johannes Geier and Daniel Mueller-Gritschneder. 2023. VRTLmod: An LLVM based Open-source Tool to Enable Fault Injection in Verilator RTL Simulations. In Proceedings of the 20th ACM International Conference on Computing Frontiers (CF '23). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3587135.3591435
- [7] J. Geier, L. Kontopoulos, D. Mueller-Gritschneder and U. Schlichtmann, "Rapid Fault Injection Simulation by Hash-Based Differential Fault Effect Equivalence Checks," 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025, pp. 1-7, doi: 10.23919/DATE64628.2025.10993266.
- [8] Guo, Zizheng, et al. "GEM: GPU-Accelerated Emulator-Inspired RTL Simulation.https://guozz.cn/publication/gemdac-25/gemdac-25.pdf
- [9] Github NVLabs https://github.com/NVlabs/GEM
Voraussetzungen
- Excellent C++, Python
- Good understanding of GPU programming (CUDA) or interest to learn
- Decent knowledge of hardware design languages (Verilog, VHDL) and EDA tools (Vivado, Yosys)
- Decent knowledge of Statistics and probability
Kontakt
Apply with CV and Transcript of Records directly to:
johannes.geier@tum.de
Betreuer:
Ingenieurpraxis
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
AI-Driven Optimization for Chip Design (Macro Placement)
Chip Design, Physical Design Automation, Optimization
We are looking for motivated students to work on algorithmic approaches for chip design optimization, with a focus on macro placement. The project combines machine learning and combinatorial optimization and can be connected to an ongoing industry challenge with a submission deadline in May 2026.
Beschreibung
Modern chip design faces increasingly complex optimization challenges, where millions of design decisions must be made under tight constraints. One key problem in this space is macro placement: arranging large components (e.g., SRAM blocks, IPs) on a chip such that routing congestion, timing, power delivery, and area are jointly optimized.
This problem is inherently difficult: it involves a highly discrete design space, multiple competing objectives, and strong global dependencies between decisions. Classical methods have been refined for decades, yet recent advances in machine learning—particularly reinforcement learning and graph-based methods—suggest new opportunities for improvement.
In this research internship, you will work on developing and evaluating novel approaches for macro placement and related optimization problems. The focus is on designing efficient algorithms that can handle large-scale, highly constrained systems and produce high-quality solutions under realistic runtime constraints.
Possible directions include:
- learning-based approaches (e.g., reinforcement learning, GNNs),
- hybrid optimization methods combining heuristics and ML,
- scalable search and approximation techniques,
- or improving classical placement strategies with modern tooling.
As part of the project, there is the opportunity to evaluate your approach on a current industry-backed challenge:
https://partcl.com/blog/macro_placement_challenge
(Submissions deadline: May 21, 2026)
While participation in the challenge is optional, it provides a concrete benchmark and external validation for your work. Projects may be conducted in small supervised groups, but individual contributions and reports are required.
The topic is well suited for continuation into a larger thesis project in the area of machine learning for systems and chip design.
Voraussetzungen
- Background in Computer Science, Electrical Engineering, or related field
- Strong programming skills
- Interest in optimization, algorithms, or machine learning
- Familiarity with ML methods (e.g., RL, deep learning, or GNNs) is a plus
- Strong problem-solving skills and willingness to work on complex systems
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Integrity Verification Schemes for Distributed AI Inference on Chiplets
Integrity, Safety, Security, Fault Tolerance
This project focuses on integrity verification mechanisms for distributed AI inference on chiplet-based architectures. The goal is to analyze and evaluate lightweight techniques for detecting faults or corrupted intermediate results during the execution of distributed neural networks, enabling reliable and efficient AI workloads across multiple compute units.
Beschreibung
Emerging computing architectures increasingly rely on chiplet-based systems and distributed execution to efficiently run complex workloads such as AI inference. In such systems, computations and intermediate results are exchanged between multiple processing units. Ensuring the integrity and correctness of these computations becomes an important challenge, particularly in the presence of hardware faults, communication errors, or malicious manipulation.
Techniques for detecting computational errors have long been studied, for example, through Algorithm-based Fault Tolerance (ABFT) methods for linear algebra operations [1]. More recently, similar concepts have been explored for machine learning workloads and NN inference, where protecting intermediate results and detecting corrupted computations is becoming increasingly important [2], [3], [4], [5]. At the same time, emerging architectures such as chiplet-based systems introduce new challenges for ensuring reliable execution across distributed compute units.
This student project investigates mechanisms for verifying the correctness of distributed AI computations in heterogeneous and chiplet-based architectures. Possible directions include:
-
Techniques for integrity verification of distributed AI inference
-
Detection of faults or corrupted intermediate results
-
Lightweight verification mechanisms based on algorithmic or system-level approaches
-
Analysis of trade-offs between reliability, performance, and overhead
References
[1] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[2] S. K. S. Hari, M. B. Sullivan, T. Tsai and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[3] J. Hoefer, M. Stammler, F. Kreß, T. Hotfilter, T. Harbaum and J. Becker, "BayWatch: Leveraging Bayesian Neural Networks for Hardware Fault Tolerance and Monitoring," 2024 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Didcot, United Kingdom, 2024, pp. 1-6, doi: 10.1109/DFT63277.2024.10753546.
[4] Z. Chen, G. Li and K. Pattabiraman, "A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction," in IEEE Design & Test, doi: 10.1109/MDAT.2025.3618758.
[5] J. Kappes, J. Geier, P. van Kempen, D. Mueller-Gritschneder and U. Schlichtmann, "Automated Graph-level Passes for TinyML Fault Tolerance," 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 2025, pp. 1-9, doi: 10.1109/IJCNN64981.2025.11227379.
Voraussetzungen
Required:
- Interest in computer architecture, machine learning systems, or reliable computing
- Programming experience (e.g., C/C++ and Python, or similar)
- Experience with ML compilers such as IREE
- Motivation to work on research-oriented topics
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with embedded software development
- Knowledge of machine learning methods
Kontakt
Apply with CV and Transcript of Records directly to: m.schirmer@tum.de
Betreuer:
Logic Synthesis and Internal Mechanism Analysis of PQC Accelerators using Yosys
Beschreibung
As we enter the quantum era, traditional encryption is becoming vulnerable. Post-Quantum Cryptography (PQC) is the next-generation standard for securing global data. However, designing hardware for PQC requires specialized tools. Yosys is the world’s leading open-source logic synthesis framework, used to transform hardware descriptions (Verilog) into real circuit structures.
This is a hands-on, exploration-oriented project. You will:
- Master the Tool: Set up and explore the Yosys open-source synthesis framework in a Linux environment.
- Hardware Analysis: Run logic synthesis for PQC-related modules (e.g., Keccak/SHA-3 or Kyber operators) and visualize how they are mapped into gate-level circuits.
- Internal Discovery: Peek "under the hood" of Yosys to understand how it translates high-level code into optimized hardware netlists.
- Documentation: Create a "Technical Internal Guide" that explains Yosys’s key features and workflows to help the lab adopt this tool for future chip designs.
Voraussetzungen
- Core Knowledge: Basic understanding of Digital Logic (Gates, Flip-flops, etc.) and Verilog/SystemVerilog.
- Technical Skills: Comfortable working in a Linux command-line environment. Basic C++ or Python knowledge is a plus.
- Mindset: Curious, detail-oriented, and enjoys "breaking things down" to see how they work.
Kontakt
If you are interested in this thesis topic and comfortable with a remote working mode, please send your CV and academic transcript to:
zhidan.zheng@tum-create.edu.sg
Betreuer:
Studentische Hilfskräfte
Web-Based Digital Microfluidic (DMF) Design Platform
Beschreibung
Project Overview
Digital Microfluidics (DMF) is a cutting-edge technology that enables the precise manipulation of minute fluid volumes (droplets) via electrical actuation. We currently have a functional web-based design tool that allows researchers to create custom PCB-based and glass-based DMF chips. This platform streamlines the transition from concept to manufacturable hardware by providing features like custom electrode placement, automated routing, and experiment definition.
We are looking for motivated students to join our follow-up project. The goal is to extend the platform's functional modules and refine the core routing algorithms to handle increasingly complex chip architectures.
Tasks
As a student on this project, you will focus on two primary areas:
1. Platform Extension & Feature Enhancement
- Integrated Path Planning: Develop an automated droplet path planning feature where users can select start and end points, and the system generates the optimal movement sequence.
- Functional Module Libraries: Create templates and interfaces for specialized biological and chemical detection modules to improve design efficiency for specific experimental scenarios.
- Advanced UI/UX: Enhance the interactive editor, building upon existing features like "undo/redo," "copy/paste," and the "parallel electrode" batch processing system.
2. Routing Algorithm Refinement
- Algorithm Optimization: Work with our existing WebAssembly (WASM) and Web Worker-based routing engine to improve performance and success rates for high-density designs.
- Geometric Refinement: Modify the grid-based routing and collision detection logic to support finer electrode spacings and complex trace widths.
- Via Management: Refine the dynamic via cost mechanisms to optimize vertical interconnections between PCB layers.
Technical Environment
You will work with a modern, high-performance tech stack:
- Frontend: Vue 3, Element Plus, and SVG for vector graphics rendering.
- Core Logic: C++ (compiled to WebAssembly) for heavy computational tasks.
- Communication: Web Serial API for real-time hardware interfacing.
- Hardware Integration: Exporting KiCad-compatible files for physical PCB manufacturing.
Requirements
- Strong interest in Electronic Design Automation (EDA) or Microfluidics.
- Proficiency in JavaScript/TypeScript (preferably Vue 3) or C++.
- Basic understanding of geometric algorithms or PCB design is a plus.
Kontakt
If you are interested, please contact:
Be sure to include your current transcript and CV with your message.
Betreuer:
Open Research Topic: AI for Hardware Design & Systems
AI for Systems, Hardware Design, Machine Learning, Optimization
Do you have a novel idea at the intersection of AI/ML and hardware design? We are looking for highly motivated students to propose and pursue their own research ideas in this space—from applying modern AI techniques to traditional hardware problems to exploring entirely new directions.
Beschreibung
The intersection of artificial intelligence and hardware/system design is rapidly evolving. Many traditional problems in areas such as chip design, optimization, and system architecture are being revisited with modern machine learning techniques—yet there is still vast untapped potential for new ideas.
This open topic is aimed at students who want to go beyond predefined projects and instead explore their own research direction. We are particularly interested in novel and creative approaches, including (but not limited to):
- Applying machine learning to classical hardware or EDA problems
- Reinforcement learning or optimization for system design and scheduling
- AI-driven design space exploration or co-design approaches
- Using modern paradigms such as foundation models or autonomous research/optimization agents
- Completely new ideas that challenge existing workflows or assumptions
The goal is to identify promising research directions and develop them into meaningful projects, with the potential to grow into a thesis or even a research publication.
You will work closely with supervision to refine your idea, scope the problem, and develop a concrete research plan—but the starting point should come from you.
Voraussetzungen
- Strong interest in research and innovation
- Familiarity with machine learning and/or systems is expected
- Ability to think independently and propose original ideas
- High motivation and curiosity
Kontakt
Please send:
- A short description of your idea (what you want to explore and why it is interesting)
- Your CV
- Your transcript of records
Betreuer:
Advancing Efficient LLM Inference with Compute-in-Memory
Large Language Models, Compute-in-Memory, AI Hardware, Efficient Inference
We are looking for a motivated student to join a research project on hardware architectures for efficient large language model inference. The internship offers the opportunity to work on a timely topic at the intersection of AI systems, computer architecture, and emerging memory-centric computing.
Beschreibung
Large language models continue to push modern hardware systems to their limits. In particular, data movement, memory access costs, and energy efficiency have become central bottlenecks for scalable inference. Compute-in-memory architectures offer a promising direction to address these challenges by bringing computation closer to where data resides and thereby reducing the overhead of conventional processor-centric systems.
In this research internship, you will contribute to an ongoing project on compute-in-memory architectures for LLM inference. The focus will be on deepening and sharpening the current research by analyzing recent developments, strengthening technical comparisons, refining system-level insights, and helping consolidate the latest progress in this rapidly evolving area.
This may include:
- studying recent work on efficient LLM inference and compute-in-memory systems,
- comparing architectural approaches and identifying key design trade-offs,
- helping structure and sharpen technical arguments and insights,
- supporting the analysis of hardware, performance, and energy-efficiency aspects,
- and contributing to the further development of a research manuscript.
The internship is particularly well suited for students who are interested in academia and would like to gain hands-on experience in how research is developed and turned into a strong scientific outcome. You will work on a highly relevant topic with clear practical and research impact, and gain insight into current questions in next-generation AI hardware.
Voraussetzungen
- Strong interest in academic research
- Basic understanding of AI/ML and computer architecture concepts
- Strong motivation to read, understand, and structure technical research
- Independent working style and willingness to learn quickly
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
Energy-Efficient AI Systems at Scale: From Optimization Models to Next-Generation Hardware & Tools
AI Systems, Hardware-Software Co-Design, Optimization, Energy Efficiency, Chiplets
Modern AI systems are pushing hardware to its limits, requiring new approaches to efficiently scale compute, memory, and energy. In this thesis, you will explore and extend cutting-edge optimization frameworks for large-scale AI workloads, with the opportunity to shape the direction of your research; from improving solver efficiency to building interactive tools or exploring learning-based optimization strategies.
Beschreibung
Recent advances in AI/ML models (e.g., large language models) demand unprecedented compute and memory resources, making efficient system design a central challenge. New approaches, such as energy-aware co-optimization of hardware architectures and workload execution, enable significant improvements in energy-delay efficiency and scalability .
This thesis builds on such optimization-driven frameworks and opens up a range of possible research directions. Rather than prescribing a fixed path, the goal is to let you explore and define your own contribution within this space, depending on your interests.
Possible directions include (but are not limited to):
- Optimization & Algorithms
- Improve scalability and efficiency of optimization solvers (e.g., MIQP-based approaches)
- Develop approximation, heuristic, or hybrid optimization techniques
- Explore alternative formulations for large-scale design space exploration
- AI for Systems / Learning-Based Methods
- Investigate reinforcement learning or learning-based approaches for scheduling, mapping, or architecture design
- Compare learned vs. analytical optimization strategies
- Scalable Systems & Workloads
- Extend analyses to extremely large workloads (e.g., LLM-scale systems)
- Study trade-offs between performance, energy, and hardware constraints
- Hardware & Architecture Exploration
- Analyze emerging architectures such as multi-chiplet systems
- Explore memory hierarchies, interconnects, and power management strategies
- Tooling & Visualization
- Develop intuitive interfaces or visual analytics tools for design space exploration
- Make complex optimization results interpretable and interactive
The work can be adapted toward a more theoretical, systems-oriented, or practical/software-driven thesis. We aim to produce publishable research results, making this an excellent opportunity for students interested in academia or advanced R&D.
Voraussetzungen
- Solid programming skills in Python
- Basic understanding of optimization, algorithms, or AI/ML concepts
- Interest in systems, performance, or hardware-software co-design
- Ability and motivation to quickly learn new concepts across multiple domains
- Strong analytical thinking and problem-solving skills
Kontakt
Reach out via ch.wolters@tum.de with your latest CV and transcript of records!
Betreuer:
ISS-based Modeling and Evaluation of the RISC-V Integrated Matrix Extension (IME)
Virtual Prototyping, RISC-V, Machine Learning, Matrix Extensions, Simulation, ISS, ISA Modeling, ETISS, Python, C++, C
This project focuses on the instruction-set simulator (ISS)–based modeling and evaluation of the RISC-V Integrated Matrix Extension (IME), which is currently under active development within the RISC-V community. The goal is to extend the ETISS simulator developed at the EDA Chair to support IME, enabling early software development, functional validation, and performance exploration of matrix-oriented workloads.
Beschreibung
Matrix extensions are a key building block for accelerating modern workloads such as machine learning, signal processing, and data analytics. There are tree styles of matrix extensions in development by the RISC-V community:
• Vector-Matrix-Extension (VME)
• Integrated-Matrix-Extension (IME)
• Attached-Matrix-Extension (AME)
In this project, the student will model the proposed IME instructions at the ISA level and integrate them into the ETISS instruction set simulator using the CoreDSL2 ecosystem. This includes defining instruction encodings, assembler syntax, and functional behavior, as well as developing a host-side simulation library to emulate matrix operations.
The extended simulator will be used to evaluate the IME design through testing and benchmarking, allowing analysis of correctness, usability, and performance characteristics. The project provides hands-on experience with ISA design, virtual prototyping, and simulation-based evaluation of emerging hardware extensions, closely aligned with current industrial and academic research.
Tasks
- Study the proposed RISC-V IME specification and analyze differences compared to VME and AME approaches
- ISA-level modeling of IME instructions using CoreDSL2 (instruction encoding, assembler syntax, and semantic behavior)
- Development of a host-side simulation library (based on existing softvector libraries) to emulate matrix operations
- Integration of IME support into the ETISS simulator using the CoreDSL retargeting ecosystem
- Evaluation of the IME extension through testing, benchmarking, and example workloads
Reading Material
- IME Proposal (SpaceMiT): https://github.com/spacemit-com/riscv-ime-extension-spec
-
Matrix Extension Proposal (T-Head): https://github.com/XUANTIE-RV/riscv-matrix-extension-spec?tab=readme-ov-file
-
IME Mailing List: https://lists.riscv.org/g/tech-integrated-matrix-extension
- IME Charter: https://riscv.atlassian.net/wiki/spaces/IMEX/pages/46071925/Charter
- VME Mailing List:
-
VME Charter: https://riscv.atlassian.net/wiki/spaces/VMEX/pages/663912452/Vector-Matrix+Extension+VME+Charter
-
AME Mailing List: https://lists.riscv.org/g/tech-attached-matrix-extension
- AME Charter: https://riscv.atlassian.net/wiki/spaces/AMEX/pages/55083388/Charter
Voraussetzungen
Required:
- Proficiency in C/C++ and Python
- Basic knowledge of instruction set architectures (ISAs) (e.g., RISC-V)?• Experience with embedded software development
Beneficial:
- Interest in virtual prototyping and simulation
- Experience with compilers or code generation
- Knowledge of machine learning workloads and hardware acceleration concepts?(e.g., GEMM, data layouts)
Kontakt
Apply with CV and Transcript of Records directly to: philipp.van-kempen@tum.de