Interested in an internship or a thesis?
Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.
Duckietown – Combined RL-Based Steering and Speed Control
Description
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In previous work, both the default PID controllers for steering and speed control were replaced independently by LCT RL agents. The modified versions of the Duckiebot control systems can keep up, respectively, with the default versions in terms of driving performance while using just slightly more computational resources. With some additional changes to the image processing pipeline, we also improved the accuracy of state measurements, e.g., the distance to the center of the lane. However, the two separate agents have not been combined so far, and have not been exposed to more complex driving scenarios.
This thesis aims to have the Duckiebots' driving be controlled entirely by RL. The student can achieve this by either merging our two RL agents into one or via a multi-agent approach. As their selected actions affect each other, they cannot be treated as entirely independent. For example, a more substantial heading angle correction is necessary to avoid leaving the lane when accelerating in curves.
The first step in this thesis will be to analyze the existing agents and investigate different concepts for their combination. When implementing the selected approach(es), new states and actions will likely extend the ruleset(s), which will demand more complex reward and Q-value update functions. Ideally, code optimizations will decrease the minimum possible RL cycle period. The student will compare the new system configuration to those with the existing separate agents and the baseline PID controller version regarding driving performance and computational resource utilization. A potential extension is the integration of more complex scenarios, such as intersections, pedestrian crossings, or traffic lights.
If the current LCT agents do not yield satisfactory results, the student could also explore other agent approaches, such as deep RL. As such methods are generally more resource-hungry, they might only be feasible by offloading parts of the control system to the Jetson Nano's GPU.
Prerequisites
Familiarity with RL, Python, ROS, and computer vision
Structured way of working and strong problem-solving skills
Interest in autonomous driving and robotics
Supervisor:
Duckietown – Image Processing on GPU
Description
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In our Duckietown lab, we have developed a stable lane following system, a CNN-based real-time object detection system, and RL extensions to the Duckiebot control system. However, since the NVIDIA Jetson Nano's CPU processing power is limited, we must carefully balance the complexity of the different components and make compromises. To allow further improvements and to speed up the execution of specific components, we now want to port parts of our software stack from the CPU to the GPU.
For the object detection system, inference has already been offloaded to the GPU. As the next most resource-intensive component, this thesis focuses on porting the lane following system to the GPU. The current implementation uses a classical computer vision approach via the OpenCV library and performs line segment detection using the LSD algorithm. Besides the necessary changes to work generally on the GPU, the student will explore the potential for parallelization via NVIDIA's CUDA API. Through this, the CPU load should be reduced significantly, allowing us to extend other parts of our whole system. Ideally, we will also be able to increase the image processing framerate, which should improve the overall lane following performance.
A first step to this thesis will be the familiarization with CUDA and an analysis of where in our code parallelization and further pipelining will be beneficial. The student should also investigate other potentially useful frameworks. After the port of the main lane following system, other components could also be offloaded to the GPU. Possible candidates are the RL algorithms for speed control and steering introduced to the Duckiebot in other theses. The student will compare the ported parts of our software stack to their existing CPU counterparts and evaluate their effect on the performance and resource utilization of the whole system.
With an alleviated CPU utilization and an ideally higher image processing framerate, we will enable the development of more complex Duckiebot behavior, such as safe navigation on tracks that include intersections, pedestrian crossings, and traffic lights. This will also allow us to experiment with more complex RL agents, which have so far been infeasible due to the limited computational resources of the Jetson Nano.
Prerequisites
- Familiarity with computer vision, CUDA, Python, and ROS
- Structured way of working and strong problem-solving skills
- Interest in autonomous driving and robotics
Contact
michael.meidinger@tum.de
Supervisor:
Time Synchronization & Trigger Coordination between communication anomaly detection and trace retrieval
Description
Future cars rely on a wide variety of sensors—including cameras, LiDARs, and RADARs—that generate enormous amounts of data. This data flows through the intra-vehicular network (IVN) to processing nodes, ultimately triggering actuators. With strict timing constraints essential for vehicle safety, time-sensitive networking (TSN) is now a critical component in modern automotive systems. Within the context of the EMDRIVE project, our team is developing new monitoring and diagnostic approaches to detect errors early and maintain functional safety in highly automated driving environments.
Project Description
This project focuses on developing a synchronization mechanism between the anomaly detection unit in the ZCU102 PL (FPGA) and the Aurix TC397 ECU’s trace system (MCDS). The goal is to ensure that communication anomalies detected in the PL are tightly aligned with ECU trace captures, so that the limited 2 MB trace buffer contains the most relevant execution history. By achieving low-latency triggering and global timestamp consistency, the Diagnosis Unit (DU) can accurately correlate network-level and processing-level anomalies.
The key tasks include:
- Design and implement a low-latency handshake (PL → Aurix) using GPIO/interrupts or timestamp markers.
- Evaluate and compare synchronization methods: hardware trigger line vs. shared global clock (PTP/PS counter).
- Modify the Aurix side (via TAS or external interrupt) to latch triggers and freeze/stop tracing instantly.
- Validate synchronization accuracy by measuring drift and latency between PL anomalies and ECU trace entries.
- Integrate synchronization metadata into DU anomaly reports.
- Allow circular trace buffer filling instead of continouse trace transfer.
Key Responsibilities:
- Collaborate with interdisciplinary teams to integrate and test the complete system.
- Develop and integrate a PL module for trigger/timestamp generation.
- Extend TAS/PS software to handle trace stop commands upon triggers.
- Configure Aurix trace system (MCDS) to respond to triggers or timestamps.
- Perform experiments and validation with injected anomalies to measure synchronization precision.
- Document the synchronization design and provide usage guidelines for future DU setups.
Prerequisites
Required Skills:
- Digital design knowledge (FPGA/PL) and Verilog/VHDL or HLS.
- Embedded C/C++ programming for Aurix TC397 (MCDS configuration) and Zynq PS.
- Familiarity with synchronization protocols (interrupt handling, timestamping, PTP).
- Skills in measurement/validation setups (logic analyzers, latency measurement).
- Basic knowledge of automotive IVN diagnostics is beneficial.
Benefits:
- Hands-on experience with hardware-software synchronization in real automotive ECUs.
- Deep understanding of PL ↔ PS ↔ ECU interaction (critical in modern heterogeneous SoCs).
- Contribution to improving accuracy and reliability of the Diagnosis Unit’s anomaly detection.
- Opportunity to validate and publish real-time synchronization results in research/industry contexts.
- Practical learning that directly applies to automotive gateway/NIC design.
Contact
Zafer Attal
Chair of Integrated Systems
Arcisstraße 21, 80333 Munich
Tel. +49 89 289 23853
zafer.attal@tum.de
www.lis.ei.tum.de
Supervisor:
Duckietown – Improved Object Detection for Autonomous Driving
Description
At LIS, we try to leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In previous work, an object detection system based on the YOLO (You Only Look Once) algorithm was developed for our Duckiebots. It detects three classes of objects (rubber duckies, other Duckiebots, and stop signs) inside camera images, and lets our robots react appropriately. The best-suited model proved to be YOLOv11n and was fine-tuned using the dataset provided by Duckietown. Evaluation with a smaller custom dataset that includes images of the updated robot design showed the approach's viability. Detection is usually reliable, but less so for other Duckiebots than duckies or stop signs. Model inference has been offloaded to the GPU of the NVIDIA Jetson Nano board powering our Duckiebot, making the total time for an object detection step short enough to have our robots react to objects in time while driving.
As one part of this follow-up project, the reaction logic should be overhauled and enhanced. The current implementation just considers the distance to the object estimated via classification bounding box dimensions and a predefined threshold for the position inside the camera image. This leads to undesired situations, e.g., stopping when another Duckiebot is approaching on the opposite lane. An additional feature could be the careful circumnavigation of duckies on the side of the road.
Furthermore, we also want to include more detected classes, e.g., stop lines or other traffic signs, and the recognition of general obstacles. This will require adaptations to the actual detection step, potentially in a hybrid manner by combining it with different camera data or measurements of the robot's distance sensor, and the reaction logic. As we rely more on object detection, an extended dataset will likely be necessary. The current custom dataset only contains few images annotated for the classes above and does not explore many environmental conditions like lighting. With a sufficient number of varied images, fine-tuning with it might become possible, ideally improving the detection precision closer to results using the open-source dataset.
Since extensions to the system will increase its complexity, another focus of this project should be optimizing processing resource utilization and detection speed. Pre- and postprocessing should be offloaded to the GPU, as already done for inference. Pipelining of the detection steps could increase the publishing frequency of detection results. As a side effect, reducing the CPU load will allow us to extend other parts of our whole system. To improve our setup in this regard, the student should further explore available frameworks. Using an external accelerator or a centralized processing unit instead is also worth investigation.
The enhanced detection system will finally lay the foundation for more complex Duckiebot behavior, such as safe navigation on tracks that include intersections or pedestrian crossings, or can be combined with our reinforcement learning agents.
Prerequisites
- Familiarity with Python, ROS, neural networks, and computer vision
- Structured way of working and strong problem-solving skills
- Interest in autonomous driving and robotics
Contact
michael.meidinger@tum.de
Supervisor:
FPGA-Based Design and Implementation of Dynamic Preloading Features
VHDL, C Programming, Distributed Memory, Data Migration, Task Migration, Hardware Accelerator
Description
Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density make DRAM omnipresend in most computer architectures. However, DRAM accesses are rather slow and require a dedicated DRAM controller
that coordinates the read and write accesses to the DRAM as well as the refresh cycles. In order to reduce the DRAM access latency, memory prefetching is a common technique to access data prior to their actual usage. However, this requires sophisticated prediction algorithms in order to prefetch the right data at the right time.
The Goal of this thesis is to refine an existing DRAM preloading mechanism on an FPGA based prototype platform. It should be able to preload different pages alternatively and dynamically switch between two pages. This requires sophisticated changes in several components, FSMs and the Tag Memory of the hardware preload unit.
Towards this goal, you'll complete the following tasks:
1. Understanding the existing Memory Access and Preloading mechanism
2. VHDL implementation of the refined preloading functionalities
3. Write and execute small baremetal test programs
4. Analyse and discuss the performance results
Prerequisites
- Good Knowledge about MPSoCs
- Good VHDL skills
- Good C programming skills
- High motivation
- Self-responsible workstyle
Contact
Oliver Lenke
o.lenke@tum.de
Supervisor:
Student
Development of an FPGA-based Packet Fragmenter
Description
Linear Network Coding is a block based Forward Error Correction (FEC) scheme currently being investigated by LIS for use in future resilient networks. FEC allows for recovery from packet errors or packet loss, without the need for retransmission, and therefore leads to increased reliability and lower latencies in lossy networks. Given that many networks, including the Internet, are lossy, these properties are highly relevant for modern computer networking.
The Network Coding FEC variant in particular deals with packet level recovery. As other block codes, a number of repair symbols are generated from a single block of information and transmitted with the original data. In case of errors, the receiver can use these repair symbols to recover the original information. However, for Network Coding, the symbols are packets instead of bits or bytes and this method is therefore better suited for dealing with packet loss.
A challenge of Network Coding is the requirement of all packets in one block to be of the same length before encoding. If the packets are of unequal length, they therefore have to be padded to the length of the longest packet in the block. This padding adds (often significant) transmission overhead and reduces the effective available bandwidth.
In order to solve this issue, we want to split packets into fragments which may be coded in blocks of shorter lengths. By using fragmentation, the average padding overhead can be decreased and the overall coding efficiency increased. This has so far been tested in simulations, but we also want to evaluate the improvements to the FEC scheme in our testbed using an FPGA-based prototype.
The goal of this thesis is to develop an FPGA-based packet fragmentation system (including fragmenter and defragmenter), verify its correctness, and evaluate its performance. If time allows, this should also be incorporated into our SmartNIC platform and tested in-network.
Supervisor:
Design and Implementation of Dynamic Preloading Features on an FPGA Prototype
VHDL, C Programming, Distributed Memory, Data Migration, Task Migration, Hardware Accelerator
Description
Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density make DRAM omnipresend in most computer architectures. However, DRAM accesses are rather slow and require a dedicated DRAM controller
that coordinates the read and write accesses to the DRAM as well as the refresh cycles. In order to reduce the DRAM access latency, memory prefetching is a common technique to access data prior to their actual usage. However, this requires sophisticated prediction algorithms in order to prefetch the right data at the right time.
The Goal of this thesis is to refine an existing DRAM preloading mechanism on an FPGA based prototype platform. It should be able to preload different pages alternatively and dynamically switch between two pages. This requires sophisticated changes in several components, FSMs and the Tag Memory of the hardware preload unit.
Towards this goal, you'll complete the following tasks:
1. Understanding the existing Memory Access and Preloading mechanism
2. VHDL implementation of the refined preloading functionalities
3. Write and execute small baremetal test programs
4. Analyse and discuss the performance results
Prerequisites
- Good Knowledge about MPSoCs
- Good VHDL skills
- Good C programming skills
- High motivation
- Self-responsible workstyle
Contact
Oliver Lenke
o.lenke@tum.de