MSEI/MSCE Research Internships

Some of the offered MSEI/MSCE research internships may be offered as tasks that also can be carried out in the context of the Project Lab Integrated Systems. If this applies it is explicitly mentioned in the associated topic description.

Available Topics

Interested in an internship or a thesis? 
Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.

Implementation and Evaulation of Hardware Match-Action Tables on FPGA

Description

With the advent of research on the next generation of
mobile communications 6G, we are engaged in exploring
architecture extensions for Smart Network Interface Cards
(SmartNICs). To enable adaptive, energy-efficient and
low-latency network interfaces, we are prototyping a
custom packet processing pipeline on FPGA-based NICs,
partially based on the open-nic project
(https://github.com/Xilinx/open-nic).

Incoming packet flows should be differentiated and differently
processed, which is typically solved with match-action tables (MATs).
MATs match on a certain packet condition (e.g. packet header 5-tuple) and execute an according action (e.g. dropping, forwarding or modifying the packet). A recent Xilinx IP core implements MATs that can be programmed with P4, a programmable packet processing language gaining momentum in networking. The goal of this work is to investigate the implementation of MATs in hardware, integrate them into our current HDL design based on open-nic and test and evaluate the results.

Prerequisites

  •     Programming skills in VHDL/Verilog and C (and Python)
  •     Practical experience with FPGA Design and Implementation
  •     Good Knowledge of computer networks, OSI layer model and protocols
  •     Preferably basic knowledge of P4 packet processing language

Contact

Marco Liess, M. Sc.

Tel.: +49.89.289.23873
Raum:
N2139
Email:
marco.liess@tum.de

Supervisor:

Marco Liess

Assigned Topics

Comparison of Safety Guarantee Mechansims for LCTs

Description

This thesis compares safety implementations for LCTs and decisively determines the superior one through simulations. The aim is to identify the safety mechanism with the best performance without violating any constraints.
To achieve this, different approaches (shielding, forbidden classifier) have to be implemented in MATLAB and good settings have to be found for each implementation.

The project is divided into two phases.

To accomplish our objective, we will implement different approaches, such as the Forbidden Classifier, Preemptive Shielding, and Post-posed Shield. We will compare these implementations with the archive we already have.   Throughout the implementation process, we will determine several properties, including how to build the forbidden classifier table, how to build the shield (i.e., what actions should be valid), and how to set the reward for the post-posed shield. Optimizing performance may require fine-tuning.

After completing the simulation phase in Matlab, we will make a decision on whether to implement the approach in hardware or test it in our Duckietown environment.

Contact

flo.maurer@tum.de

Supervisor:

Hardware Validation Intern

Description

Research Internship at Apple

Supervisor:

Anmol Prakash Surhonne - Sven Engleitner (Apple)

Duckietown - Image Processing on FPGAs

Description

At LIS we want to use the Duckietown hardware and software ecosystem for experimenting with our reinforcement learning based learning classifier tables (LCT) as part of the control system of the Duckiebots: https://www.ce.cit.tum.de/lis/forschung/aktuelle-projekte/duckietown-lab/

More information on Duckietown can be found on https://www.duckietown.org/.

In this student work, we want to enable the use of the FPGA in the Lane Detection.
Previous work already experimented with the communication between NVIDIA Jetson and the FPGA via a DMA.

Goal of this work is to port the LSD to FPGA to benefit from offloading parts of the Lane Detection Alogithm from the CPU and execute them accelerated on the FPGA.
At the end, there should be a seamless integration in the Lane Following Pipeline.

Prerequisites

  • Knowledge about Image Processing
  • Lots of FPGA experience
  • VHDL
  • Python

Contact

flo.maurer@tum.de
michael.meidinger@tum.de

Supervisor:

Florian Maurer, Michael Meidinger

Development of a Packet Forwarding Application in RTEMS

Description

In our IPF project, we optimize application execution during runtime by using self-aware DVFS and task mapping algorithms.

For "real-world" testing, we need a packet forwarding application running on our SparcV8 processors. This application should periodically generate and process packets and provide metrics such as "generation time", "scheduling time" and "deadline".
It should also be possible to discard packets if the deadline is exceeded.

It is planned to implement this application within RTEMS.

Prerequisites

  • Experience in low-level programming (registers, timers, etc)
  • Experience with real time operating systems
  • Strong problem-solving skills, attention to detail, and the ability to work both independently and collaboratively in a team environment

Contact

michael.meidinger@tum.de
flo.maurer@tum.de

Supervisor:

Michael Meidinger, Florian Maurer

Design and Implementation of a Stride Prefetching Mechanism in SystemC

Description

Since DRAM typically come with much higher access latencies than SRAM, many approaches to reduce DRAM latencies have already been explored, such as Caching, Access predictors, Row-buffers etc.

In the CeCaS research project, we plan to employ an additional mechanism, in detail a preloading mechanism of a certain fraction of the DRAM content to a small on-chip SRAM buffer. Thus, it is required to predict potentially next-accessed Cachelines, preload them to the SRAM and answer subsequent memory requests of this data from the SRAM instead forwarding them to the DRAM itself.

This functionality should be implemented as a TLM/SystemC model using Synopsys Platform Architect. A baseline system will bw provided, the goal is to implement this functionality in its simplest form as a baseline. Depending on the progress, this can be extended or refined in subsequent steps.

A close supervision, especially during the inital phase, will be guaranteed. Nevertheless, some experience with TLM modelling (e.g. SystemC Lab of LIS) or C++ programming is required.

 

Prerequisites

  • Experience with TLM modelling (e.g. SystemC Lab of LIS)
  • B.Sc. in Electrical Engineering or similar

 

Contact

Oliver Lenke

o.lenke@tum.de

Supervisor:

Oliver Lenke

Function Chain on Aurix TC3x Boards

Description

The diagnosis companion box (DCB) to be invesIgated in the EMDRIVE project supports the diagnosis of sporadically occurring systemaIc errors in in-vehicular networks, which have not been detected at design Ime. To idenIfy such issues and analyze/correlate them with potenIal root causes at system runIme, the DCB conInuously monitors traffic flows on the in-vehicular network (IVN) for deviaIons from the expected behavior and performs an iniIal analysis of potenIal root causes by inspecIng the processor traces of the source of the abnormal behavior.

To showcase the funcIonality of the DCB the demonstraIon scenario as depicted in the figure below is planned. The basis for the demonstraIon is a funcIonal chain of subfunc Ions F1, F2 and F3, that together make up an automoIve funcIon that is fed by a sensor and produces output for an actor. The sub-funcIons are mapped on different ECUs that interchange data among each other via Ethernet. The ECUs are represented by Aurix TFT Boards. An Ethernet switch, which has mirroring funcIonality, allows forwarding traffic that is exchanged between the regular interfaces to a specific output. This mirroring port feeds one of the inputs of the ZCU 102, which acts as the DCB. A TAS server running on the processor cores of the ZCU102 complements the setup. It allows configuring the MCDS on the Aurix boards on request of the control enIty of the DCB and fetches the traces captured from the Aurix boards to the DCB for online analysis.

The demonstraIon setup further provides the opIon to introduce arIficial errors in the processing of sub-funcIons F1 or/and F2, which lead to an anomaly in the Ethernet communicaIon to the subsequent sub-funcIon F2 or/and F3. This should be detectable by the DCB, which – depending on the concrete communicaIon anomaly – would configure the respecIve Aurix controller with an appropriate MCDS configuraIon. The trace data should then be delivered to the second port of the ZCU 102.

The task of the planned research internship is to establish the example applicaIon that makes up the funcIonal chain to be executed on the interconnected Auris boards. This encompasses the following sub-tasks:

  • Bring-Up of the Aurix boards including the Aurix development environment.
  • Determine an appropriate funcIonal chain that exchanges periodical traffic among its sub-funcIons and get them running on the Aurix boards.
  • Establish Ethernet based data exchange.
  • Establish measures to arIficially induce a disturbance of the processing within the Aurix cores so that the sub-funcIons produce an anomaly in their data exchange. (Current working hypothesis would be that the periodicity of the traffic is changed.)

Supervisor:

Non intrusive hardware tracing over ethernet

Description

Tracing of events in hardware components is one powerful tool to monitor, debug and improve existing designs. Through this approach detailed insights can be acquired and peak performance can be achieved, while being a challenging task to be integrated with good performance. One of the major challenges of tracing is to collect as much information as possible with ideally no impact on the to-be-analyzed system. Herewith, it can be ensured that the gained insights are representative of an execution without any tracing enabled. In this work, a hardware tracing component should be designed that takes an arbitrary data input and sends it via an ethernet connection to a different PC that performs the postprocessing of the data. The tracing component has to be designed in a way that for sending the data over ethernet no CPU involvement is required to minimize the impact on the traced system. This tracing component should be integrated into the hardware platform based on a Xilinx Zynq board. This features a heterogeneous ARM multicore setup directly integrated into the ASIC, combined with programmable logic in the FPGA part of the chip. In the FPGA a hardware accelerator is already implemented that should be traced with the new component.

Prerequisites

To successfully complete this work, you should have:

  • good HDL programming skills,
  • experience with microcontroller programming,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Contact

Email: lars.nolte@tum.de

Supervisor:

Lars Nolte

Duckietown Autonomous Driving Pipeline - FPGA

Description

At LIS we want to use the Duckietown hardware and software ecosystem for experimenting with our reinforcement learning based learning classifier tables (LCT) as part of the control system of the Duckiebots: https://www.ce.cit.tum.de/lis/forschung/aktuelle-projekte/duckietown-lab/

More information on Duckietown can be found on https://www.duckietown.org/.

In this student work, we want to enable the use of the FPGA in the Lane Detection.
Therefore, the different stages of the Lande Detection Pipeline should be ported to FPGA.
In order to comunicate with the NVIDIA Jetson Nano Platform, the ported algorithm has to connect to the XILINX PCIE DMA IP-Core.

Prerequisites

  • Knowledge about VHDL and Xilinx IP-cores

Contact

flo.maurer@tum.de

Supervisor:

Comparing DPDK with traditional Linux based networking

Description

With the ever-increasing network speeds of physical links, the processing of packets on network nodes is becoming more and more of a bottleneck. Packet processing on a standard Linux-based network node traditionally involves the operating system (OS). Since an OS is usually optimized for a range of tasks rather than a specific task, using conventional Linux kernel functionalities for packet processing can degrade performance. For this reason, approaches to bypass the kernel have been proposed to perform network processing in user space.

One approach of bypassing the kernel that has attracted growing interest in recent years is Data Plane Development Kit (DPDK). By processing packets entirely in user space, DPDK avoids time-consuming context switches between user space and kernel space. This comes at the cost of one CPU core actively polling for new packets, instead of the network interface card (NIC) triggering interrupts for incoming packets. In addition, DPDK itself mainly provides the poll mode drivers for selected NICs, but the processing of the packets is the duty of the application using DPDK. Thus, while DPDK is suitable for certain application scenarios, there are also numerous use cases that are better suited to be implemented using the Linux networking stack. For example, to establish a Transmission Control Protocol (TCP) connection, an additional user space TCP/IP stack must be implemented or taken from open-source projects. These are generally not as feature-rich as the conventional Linux networking stack and do not necessarily improve performance.

This work aims to find a method to compare applications using DPDK with applications using the Linux network stack. Envisioned is a client-server application that uses iperf3 to generate data traffic.

Prerequisites

To successfully complete this work, you should have:

  • very good programming skills in Python and C/C++,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Contact

Email: lars.nolte@tum.de

Supervisor:

Lars Nolte

Implement a Neural Network based DVFS controller for runtime SoC performance-power optimization

Keywords:
Neural Networks, DVFS, Machine learning,

Description

Reinforcement learning (RL) has been widely used for run-time management on multi-core processors. RL-based controllers can adapt to varying emerging workloads, system goals, constraints and environment changes by learning from their experiences.

Neural Networks are a set of ML methods which are inspired by the human brain, mimicking the way that biological neurons signal to one another.

In this work, you will

1. Understand the working of Neural Networks. Implement a neural network in C.


2. Understand the architecture of the Leon3 based SoC.

3. Use neural networks to learn and control the processor voltage and frequency in runtime to optimize performance and power.

4. Design, test and implement the work on Xilinx FPGA

 

Prerequisites

To successfully complete this project, you should already have the following skills and experiences: 
• Good VHDL and C programming skills 
• Good understanding of MPSoCs
• Self-motivated and structured work style
• Knowledge of machine learning algorithms

 

Contact

Anmol Surhonne

Technische Universität München
Department of Electrical and Computer Engineering
Chair of Integrated Systems
Arcisstr. 21
80290 München
Germany

Phone: +49.89.289.23872
Fax: +49.89.289.28323
Building: N1 (Theresienstr. 90)
Room: N2137
Email: anmol.surhonne at tum dot de

 

 

Supervisor:

Anmol Prakash Surhonne

Investigation and Implementation of Approximate Comparators for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Investigation and Implementation of Approximate Dividers for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Investigation and Implementation of Approximate Dividers for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Introspective Failure Prediction Algorithms on FPGAs

Description

Failure cases in autonomous driving are important to collect and investigate to improve the performance  of the system before large-scale deployment. Additionally, disengaging the autonomous driving system before the failure takes place can allow the human to take over in good time and maintain safety in uncertain situations. It is important to accelerate these algorithms on low-power, low-latency hardware, as they must run alongside the more compute intensive autonomous driving stack.

In this research internship, an image-classification convolutional neural network will be trained on a failure predicition dataset, then deployed on a dataflow based accelerator. The accelerator will be optimized for speed and efficiency, and HW-error cases will be investigated.

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good prgramming skills in Python and Pytorch
  • Basic programming skills in HDL/HLS
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent. By completing this project, you will be able to:

  • Optimize CNNs and their target hardware accelerator to improve overall system performance
  • Test and evaluate solutions for correctness and applicability
  • Present your work in the form of a scientific report

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

This project is in cooperation with BMW AG.

Supervisor:

Nael Yousef Abdullah Al-Fasfous

Accelerating Object-Detection Algorithms on NVDLA

Description

Convolutional neural networks (CNNs) are the state of the art for most computer vision tasks. Although their accuracy is unrivaled when compared to classical segmentation and classification algorithms, they present many challenges for implementation on hardware platforms. Most performant CNNs tend to be computationally complex for low-power embedded applications. Finding a good trade-off between accuracy and efficiency can be critical when deciding the network architecture and the target hardware.

This work focuses on acclerating CNNs for object-detection on the NVIDIA Deep Learning Accelerator (NVDLA). Different CNNs can be benchmarked, new layers must be added, and execution must be optimized to maintain minimum latency.

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good prgramming skills in C++, Python and Tensorflow
  • Good programming skills in HDL
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent. By completing this project, you will be able to:

  • Implement object-detection CNNs on a state-of-the-art accelerator
  • Optimizing CNNs through quantization and pruning to improve overall system performance
  • Test and evaluate solutions for correctness and applicability
  • Present your work in the form of a scientific report

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems
Arcisstr. 21
80333 Munich
Germany

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

This project is in cooperation with BMW AG.

Supervisor:

Nael Yousef Abdullah Al-Fasfous

Implementation of an Approximated FIR Filter on FPGA for Laser Line Extraction from Pixel Data

Description

Current 3D laser line scanners have precision in the range of a micrometer. These scanners work on the principle of laser triangulation and use a camera chip in the receive path. The captured pixel data is then processed on an FPGA to generate 3D profile data. In order to do this, the lsaser line, as seen by the camera, must be extracted from the pixel data. For this task, several methods have been proposed. One of these methods employs an FIR filter to calculate the derivative of the incoming pixel stream orthogonally to the laser line direction. Afterwards, the zero crossing of this derivative is detected. The position of the zero crossing marks the position of the laser line in the camera image. From this position, the distance of the laser scanner to the scanned object can be derived.

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. In this thesis, such approximation methods should be integrated to the data processing pipeline and the results should be evaluated.

This thesis includes the implementation of a simple data processing pipeline for the extraction of the laser line from pixel data using an FIR filter-based approach. The implementation should be done in VHDL. Furthermore, the necessity for prefiltering (e.g. smoothing) of the pixel data should be assessed and implemented if necessary. Finally, the potential for the integration of approximate computing methods into the data processing pipeline should be evaluated.

Prerequisites

The student should have the following skills in order to successfully complete the thesis.

  • Good programming skills in VHDL
  • A basic understanding of FIR filter design
  • A basic understanding of image processing
  • The ability to work independently
  • Previous experience with approximate computing is helpful, but not essentially required

The student can work on the thesis remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate and FPGA Design Engineer
SmartRay GmbH


arne.kreddig@smartray.com

Supervisor:

Arne Kreddig - (SmartRay GmbH)

Graph Neural Network-based Pruning

Description

Convolutional neural networks (CNNs) are the defacto standard for many computer vision (CV) applications. These range from medical technology, robotics applications to autonomous driving. However, most modern CNNs are very memory and compute intensive, particularly when they are dimensioned for complex CV problems.


Compressing neural networks is essential for a variety of real-world applications. Pruning is a widely used technique for reducing the complexity of a neural network by removing redundant and superfluous parameters. One characteristic of this approach is the pruning granularity, which describes the substructures that should be removed from the neural network. Another aspect is the method for finding the redundant and unused structures, which plays a central role in effective pruning without loss of task-related accuracy. The optimization goal determines which elements (kernel, filter, channel) can be removed from the topology of the CNN.

The goal of this work is to learn the internal relationships between the channels, filters, kernels of the layers by means of a graph neural network, and identify their relevance to the classification task of the CNN. The learned relationships are then used for pruning the neural network.

Prerequisites

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in Python and Tensorflow
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent.

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Supervisor:

Alexander Frickenstein, Nael Yousef Abdullah Al-Fasfous