The PULP Platform and Its Efforts Around the CVA6 RISC-V Core
Description
The Parallel Ultra-Low Power (PULP) platform is an open-source hardware and software ecosystem developed by ETH Zürich and the University of Bologna to explore energy-efficient computing. It provides scalable multi-core architectures, SoC components, and toolchains designed for applications where power consumption is critical, such as edge AI, IoT, and embedded sensing.
At its core, PULP focuses on parallelism and low-power techniques, combining lightweight RISC-V processors, tightly coupled memory hierarchies, and domain-specific accelerators. The modular and flexible platform enables researchers and developers to prototype custom system-on-chip designs while leveraging a growing suite of open-source IP blocks, including the CVA6 RISC-V core.
This CPU core, formerly Ariane, is a 64-bit, in-order, six-stage-pipelined application-class processor compatible with the RISC-V RV64GC instruction set. Though optimized for energy efficiency, it is powerful enough to boot operating systems such as Linux or FreeRTOS. With standard interfaces like AXI for memory and peripheral access, a rich toolchain, and a release under the permissive Solderpad license, the CVA6 core is very useful for system-on-chip integration in research.
In this seminar topic, the student should further investigate the PULP platform, its contributions, including the CVA6 core, and especially its recent projects on novel system architectures. Examples for case studies are the lightweight Cheshire and coherence-focused Culsans platforms, or the Occamy chiplet system comprising Snitch clusters. Besides the conceptual aspects, their performance, resource utilization, and tapeout characteristics should be analyzed. Another focus of this seminar should be the toolchains provided by the PULP platform and the workflow of integrating, adapting, and verifying their designs in other projects.
Possible starting points for literature research are listed below.
The PULP Platform and Its Efforts Around Chiplet Systems
Description
The Parallel Ultra-Low Power (PULP) platform is an open-source hardware and software ecosystem developed by ETH Zürich and the University of Bologna to explore energy-efficient computing. It provides scalable multi-core architectures, SoC components, and toolchains designed for applications where power consumption is critical, such as edge AI, IoT, and embedded sensing.
At its core, PULP focuses on parallelism and low-power techniques, combining lightweight RISC-V processors, tightly coupled memory hierarchies, and domain-specific accelerators. The modular and flexible platform enables researchers and developers to prototype custom system-on-chip designs while leveraging a growing suite of open-source IP blocks, including the CVA6 RISC-V core and novel system architectures.
In this seminar topic, the student should investigate specifically PULP's efforts around chiplet-based systems like the Occamy system, comprising Snitch clusters. Another focus should be the PULP Serial Link as an open, simple, and parameterizable (chiplet) interconnect.
For comparison, literature research should also involve other open-source platforms looking into chiplet systems. Similarly, the student should compare the Serial Link to other open-source or industry chiplet interconnects, like the Universal Chiplet Interconnect Express (UCIe). Relevant quantitative metrics include possible bandwidth ranges, latency, power consumption, and chip area/pin count requirements. Qualitative aspects like licensing, availability of toolchains or IP cores, successful tapeouts, or development complexity should be analyzed as well.
Possible starting points for literature research are listed below.
The PULP Platform and Its Efforts Around the CVA6 RISC-V Core
Description
The Parallel Ultra-Low Power (PULP) platform is an open-source hardware and software ecosystem developed by ETH Zürich and the University of Bologna to explore energy-efficient computing. It provides scalable multi-core architectures, SoC components, and toolchains designed for applications where power consumption is critical, such as edge AI, IoT, and embedded sensing.
At its core, PULP focuses on parallelism and low-power techniques, combining lightweight RISC-V processors, tightly coupled memory hierarchies, and domain-specific accelerators. The modular and flexible platform enables researchers and developers to prototype custom system-on-chip designs while leveraging a growing suite of open-source IP blocks, including the CVA6 RISC-V core.
This CPU core, formerly Ariane, is a 64-bit, in-order, six-stage-pipelined application-class processor compatible with the RISC-V RV64GC instruction set. Though optimized for energy efficiency, it is powerful enough to boot operating systems such as Linux or FreeRTOS. With standard interfaces like AXI for memory and peripheral access, a rich toolchain, and a release under the permissive Solderpad license, the CVA6 core is very useful for system-on-chip integration in research.
In this seminar topic, the student should further investigate the PULP platform, its contributions, including the CVA6 core, and especially its recent projects on novel system architectures. Examples for case studies are the lightweight Cheshire and coherence-focused Culsans platforms, or the Occamy chiplet system comprising Snitch clusters. Besides the conceptual aspects, their performance, resource utilization, and tapeout characteristics should be analyzed. Another focus of this seminar should be the toolchains provided by the PULP platform and the workflow of integrating, adapting, and verifying their designs in other projects.
Possible starting points for literature research are listed below.
Duckietown – Combined RL-Based Steering and Speed Control
Description
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In previous work, both the default PID controllers for steering and speed control were replaced independently by LCT RL agents. The modified versions of the Duckiebot control systems can keep up, respectively, with the default versions in terms of driving performance while using just slightly more computational resources. With some additional changes to the image processing pipeline, we also improved the accuracy of state measurements, e.g., the distance to the center of the lane. However, the two separate agents have not been combined so far, and have not been exposed to more complex driving scenarios.
This thesis aims to have the Duckiebots' driving be controlled entirely by RL. The student can achieve this by either merging our two RL agents into one or via a multi-agent approach. As their selected actions affect each other, they cannot be treated as entirely independent. For example, a more substantial heading angle correction is necessary to avoid leaving the lane when accelerating in curves.
The first step in this thesis will be to analyze the existing agents and investigate different concepts for their combination. When implementing the selected approach(es), new states and actions will likely extend the ruleset(s), which will demand more complex reward and Q-value update functions. Ideally, code optimizations will decrease the minimum possible RL cycle period. The student will compare the new system configuration to those with the existing separate agents and the baseline PID controller version regarding driving performance and computational resource utilization. A potential extension is the integration of more complex scenarios, such as intersections, pedestrian crossings, or traffic lights.
If the current LCT agents do not yield satisfactory results, the student could also explore other agent approaches, such as deep RL. As such methods are generally more resource-hungry, they might only be feasible by offloading parts of the control system to the Jetson Nano's GPU.
Prerequisites
Familiarity with RL, Python, ROS, and computer vision Structured way of working and strong problem-solving skills Interest in autonomous driving and robotics
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In our Duckietown lab, we have developed a stable lane following system, a CNN-based real-time object detection system, and RL extensions to the Duckiebot control system. However, since the NVIDIA Jetson Nano's CPU processing power is limited, we must carefully balance the complexity of the different components and make compromises. To allow further improvements and to speed up the execution of specific components, we now want to port parts of our software stack from the CPU to the GPU.
For the object detection system, inference has already been offloaded to the GPU. As the next most resource-intensive component, this thesis focuses on porting the lane following system to the GPU. The current implementation uses a classical computer vision approach via the OpenCV library and performs line segment detection using the LSD algorithm. Besides the necessary changes to work generally on the GPU, the student will explore the potential for parallelization via NVIDIA's CUDA API. Through this, the CPU load should be reduced significantly, allowing us to extend other parts of our whole system. Ideally, we will also be able to increase the image processing framerate, which should improve the overall lane following performance.
A first step to this thesis will be the familiarization with CUDA and an analysis of where in our code parallelization and further pipelining will be beneficial. The student should also investigate other potentially useful frameworks. After the port of the main lane following system, other components could also be offloaded to the GPU. Possible candidates are the RL algorithms for speed control and steering introduced to the Duckiebot in other theses. The student will compare the ported parts of our software stack to their existing CPU counterparts and evaluate their effect on the performance and resource utilization of the whole system.
With an alleviated CPU utilization and an ideally higher image processing framerate, we will enable the development of more complex Duckiebot behavior, such as safe navigation on tracks that include intersections, pedestrian crossings, and traffic lights. This will also allow us to experiment with more complex RL agents, which have so far been infeasible due to the limited computational resources of the Jetson Nano.
Prerequisites
Familiarity with computer vision, CUDA, Python, and ROS
Structured way of working and strong problem-solving skills
Duckietown – Improved Object Detection for Autonomous Driving
Description
At LIS, we try to leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In previous work, an object detection system based on the YOLO (You Only Look Once) algorithm was developed for our Duckiebots. It detects three classes of objects (rubber duckies, other Duckiebots, and stop signs) inside camera images, and lets our robots react appropriately. The best-suited model proved to be YOLOv11n and was fine-tuned using the dataset provided by Duckietown. Evaluation with a smaller custom dataset that includes images of the updated robot design showed the approach's viability. Detection is usually reliable, but less so for other Duckiebots than duckies or stop signs. Model inference has been offloaded to the GPU of the NVIDIA Jetson Nano board powering our Duckiebot, making the total time for an object detection step short enough to have our robots react to objects in time while driving.
As one part of this follow-up project, the reaction logic should be overhauled and enhanced. The current implementation just considers the distance to the object estimated via classification bounding box dimensions and a predefined threshold for the position inside the camera image. This leads to undesired situations, e.g., stopping when another Duckiebot is approaching on the opposite lane. An additional feature could be the careful circumnavigation of duckies on the side of the road.
Furthermore, we also want to include more detected classes, e.g., stop lines or other traffic signs, and the recognition of general obstacles. This will require adaptations to the actual detection step, potentially in a hybrid manner by combining it with different camera data or measurements of the robot's distance sensor, and the reaction logic. As we rely more on object detection, an extended dataset will likely be necessary. The current custom dataset only contains few images annotated for the classes above and does not explore many environmental conditions like lighting. With a sufficient number of varied images, fine-tuning with it might become possible, ideally improving the detection precision closer to results using the open-source dataset.
Since extensions to the system will increase its complexity, another focus of this project should be optimizing processing resource utilization and detection speed. Pre- and postprocessing should be offloaded to the GPU, as already done for inference. Pipelining of the detection steps could increase the publishing frequency of detection results. As a side effect, reducing the CPU load will allow us to extend other parts of our whole system. To improve our setup in this regard, the student should further explore available frameworks. Using an external accelerator or a centralized processing unit instead is also worth investigation.
The enhanced detection system will finally lay the foundation for more complex Duckiebot behavior, such as safe navigation on tracks that include intersections or pedestrian crossings, or can be combined with our reinforcement learning agents.
Prerequisites
Familiarity with Python, ROS, neural networks, and computer vision
Structured way of working and strong problem-solving skills
Extended SystemC Model for Design Space Exploration of a Chiplet-Based System
Description
In the BCDC project, a working group at TUM collaborates on designing a RISC-V-based chiplet demonstration chip, of which at least two will be connected via an interposer to simulate a system of interconnected chiplets. At LIS, we work on a high-performance, low-latency chiplet interconnect with additional application-specific features managed by a smart protocol controller. It closes the gap between the underlying physical layer that takes care of data transmission across the interposer and the system bus that attaches the inter-chiplet interface to the other components of the demonstration chip.
In previous work, a high-level simulation of our system has been set up using SystemC Transaction-Level Modeling (TLM). The model represents chiplets, an additional FPGA, and their interconnect in a configurable manner. The user can record a wide range of statistics for the evaluation of different system configurations and design details. This follow-up Master’s thesis serves to extend the model to bring it closer to a real system and to enhance its functionality as a design exploration tool.
As a first step, the overall architecture of the chiplet model is to be reworked. A more realistic system bus, oriented on AXI4, including bursts and congestion handling mechanisms, is to be implemented. Other parts of the TUM demonstration chip, like hardware accelerators with local memory, are to be added.
The capabilities of the interconnect protocol should be extended, e.g., by read/write operations to other memories besides the main chiplet RAM, DMA support, or application-specific extra features as provided by the smart protocol controller. This process will be based on a more formal specification of the interconnect protocol. The modeling of the interconnect itself should also involve further details specific to the investigated standards and custom solutions.
With a more complex chiplet model, more configuration options should be offered. An example could be the instantiation of a pure memory chiplet. The type of interconnect between individual chiplets should also be configurable beyond the currently supported parameters. All of these added possibilities will rely on an overhauled inter-chiplet routing mechanism.
Furthermore, the existing user code applications should be supplemented by ones utilizing the added hardware accelerators or benefiting from multi-core operation.
On the side of usability improvements, further statistics should be collected, automatically processed, and visualized to simplify the evaluation of the explored system configurations. Ultimately, the simulation should help identify the benefits and drawbacks of these configurations and support a future HDL implementation.
Prerequisites
Understanding of chiplet architectures, especially their interconnect
Experience with SystemC TLM
Structured and independent way of working and strong problem-solving skills
Duckietown - DuckieVisualizer Extension and System Maintenance
Description
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
In previous work, we developed a tool called DuckieVisualizer to monitor our Duckiebots, evaluate their driving performance, and visualize and interact with the actively learning RL agents.
This student assistant position will involve extending the tool and its respective interfaces on the robot side by further features, e.g., more complex learning algorithms or driving statistics. The underlying camera processing program should also be ported from Matlab to a faster programming language to enable real-time robot tracking. Furthermore, more robust Duckiebot identification mechanisms should be considered.
Besides these extensions to the DuckieVisualizer, the student will also do some general system maintenance tasks. This may include the hardware of the Duckiebots and their software stack, for example, merging different sub-projects and looking into quality-of-life improvements to the building process using Docker. Another task will be to help newly starting students set up their development environment and to assist them in their first steps. Finally, the student can get involved in expanding our track and adding new components, e.g., intersections or duckie pedestrian crossings.
Prerequisites
Understanding of networking and computer vision
Experience with Python, ROS, and GUI development
Familiarity with Docker and Git
Structured way of working and strong problem-solving skills
At LIS, we leverage the Duckietown hardware and software ecosystem to experiment with our reinforcement learning (RL) agents, known as learning classifier tables (LCTs), as part of the Duckiebot control system. More information on Duckietown can be found here.
We use a Duckiebot's Time-of-Flight (ToF) sensor to measure the distance to objects in front of the robot. This allows it to stop before crashing into obstacles. The distance measurement is also used in our platooning mechanism. When another Duckiebot is detected via its rear dot pattern, the robot can adjust its speed to follow the other Duckiebot at a given distance.
Unfortunately, the measurement region of the integrated ToF sensor is very narrow. It only detects objects reliably in a cone of about 5 degrees in front of the robot. Objects outside this cone, either too far to the side or too high/low, cannot reflect the emitted laser beam to the sensor's collector, leading to crashes. The distance measurement is also fairly noisy, with measurement accuracy decreasing for further distances, angular offsets from the sensor, and uneven reflection surfaces. This means that the distance to the other Duckiebot is often not measured correctly in the platooning mode, causing the robot to react with unexpected maneuvers and to lose track of the leading robot.
In this student assistant project, the student will investigate how to resolve these issues. After analyzing the current setup, different sensors and their position on the robot's front should be considered. A suitable driver and some hardware adaptations will be required to add a new sensor to the Duckiebot system. Finally, they will integrate the improved distance measurement setup in our Python/ROS-based autonomous driving pipeline, evaluate it in terms of measurement region and accuracy, and compare the new setup to the baseline.
These modifications should allow us to avoid crashes more reliably and enhance our platooning mode, which will be helpful for further development, especially when moving to more difficult-to-navigate environments, e.g., tracks with intersections and sharp turns.
Prerequisites
Basic understanding of sensor technology and data transmission protocols
Experience or motivation to familiarize yourself with Python and ROS
Structured way of working and strong problem-solving skills
Interest in autonomous driving and robotics
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Completed Theses
Bachelor's Theses
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Supervisor:
Michael Meidinger
Contact
michael.meidinger@tum.de
Supervisor:
Michael Meidinger
Contact
flo.maurer@tum.de michael.meidinger@tum.de
Supervisor:
Florian Maurer, Michael Meidinger
Contact
flo.maurer@tum.de michael.meidinger@tum.de
Supervisor:
Florian Maurer, Michael Meidinger
Master's Theses
Supervisor:
Michael Meidinger, Fabian Schätzle (Forschungszentrum Jülich GmbH)