Masterarbeiten

Offene Arbeiten

Interesse an einer Studien- oder Abschlussarbeit?
In unseren Arbeitsgruppen sind oftmals Arbeiten in Vorbereitung, die hier noch nicht aufgelistet sind. Teilweise besteht auch die Möglichkeit, ein Thema entsprechend Ihrer speziellen Interessenslage zu definieren. Kontaktieren Sie hierzu einfach einen Mitarbeiter aus dem entsprechenden Arbeitsgebiet. Falls Sie darüber hinaus allgemeine Fragen zur Durchführung einer Arbeit am LIS haben, wenden Sie sich bitte an Dr. Thomas Wild.

Hardware-Aware Layer Fusion of Deep Neural Networks

Beschreibung

Dataflow and mapping of Convolutional Neural Networks (CNN) influences their compute and energy efficiency on edge accelerators. Layer fusion is a concept which enables the processing of multiple CNN layers without resorting to costly off-chip memory accesses. In order to optimally implement layer fusion, different combinations of mapping and scheduling parameters need to be explored. We, at the BMW group, offer you a challenging master thesis position that aims to optimize the fusion strategy of a given CNN workload for maximal data reuse and resource utilization.

Voraussetzungen

  • Strong knowledge in computer vision concepts, and convolutional neural networks.
  • Hands-on experience with Xilinx FPGAs, Verilog/VHDL/HLS.
  • Excellent programming skills in  C, Python. Experience in Tensorflow 2, Git, Docker is a plus.
  • Highly motivated and eager to collaborate in a team.
  • Ability to speak and write in English fluently.

Kontakt

Shambhavi.balamuthu-sampath@bmw.de

Betreuer:

Walter Stechele - Shambhavi.balamuthu-sampath@bmw.de (BMW)

Laufende Arbeiten

Hardware-Accelerated Linux Kernel Tracing

Beschreibung

Tracing events with hardware components is one powerful tool to monitor, debug, and improve existing designs. Through this approach, detailed insights can be acquired, and peak performance can be achieved, while being a challenging task to be integrated with good performance. One of the major challenges of tracing is to collect as much information as possible with ideally no impact on the to-be-analyzed system. Herewith, it can be ensured that the gained insights are representative of an execution without any tracing enabled. In this work, a hardware tracing component should be leveraged to reduce the intrusiveness of existing software tracing mechanisms in the Linux kernel. 

This should be integrated and tested on a hardware platform based on a Xilinx Zynq board. This features a heterogeneous ARM multicore setup directly integrated into the ASIC, combined with programmable logic in the FPGA part of the chip. In the FPGA a hardware accelerator is already implemented that should be traced with the new component.

Voraussetzungen

To successfully complete this work, you should have:

  • experience with microcontroller programming,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Betreuer:

Lars Nolte

Multicore-Optimierung eines bildverarbeitenden Systems

Beschreibung

Im industriellen Umfeld werden Informationen zunehmend in visuellen Codes (z.B.
Strichcodes, QR-Codes) zur automatisierten Verarbeitung abgelegt. Steigende
Durchsatzzahlen stellen immer höhere Anforderungen an die Geschwindigkeit der
Datenverarbeitung.
In dieser Arbeit soll anhand eines kostengünstigen kommerziell erhältlichen Multicore-
Systems untersucht werden, inwieweit bisher durch Hardware realisierte
Verarbeitungsgeschwindigkeiten durch Parallelisierung der Auswertungsschritte in CPU-Systemen erreicht, werden können.
Insbesondere soll untersucht werden, ob spezialisierte Co-Prozessoren (z. B. Vector
Processing Units (VPUs)) zur Beschleunigung beitragen können oder wie diese auf die Aufgabe hin optimiert gestaltet werden können (Application-Specific Instructionset
Processor (ASIP)).

Betreuer:

Marco Liess

A Deep Dive into C-States, Idle Governors and the Prospects of an eBPF Idle Governor

Beschreibung

Linux is one of the most utilized Operating Systems in Embedded Systems and Cloud
Infrastructure worldwide. Sustainability will become more relevant in the future and saving power is a crucial aspect. This shows the increasing importance of efficient Linux Power Management.


The Power Management in Linux is implemented in several kernel subsystems correlating to hardware characteristics, like P-States (Frequency Scaling) and C-States (Sleep States). This thesis examines the Idle Power Management of Linux, and therefore focuses on C-States. C-States are per Core states and allow parts of the core to shut down individual features. Each processor implements C-States in different ways. Increasing C-State number, e.g. C6, translate to a deeper sleep with lower energy consumption and higher power-on reaction time.


The recently released eBPF functionality makes the kernel more programmable, bypassing the original monolithic characteristics. This mechanism can be divided into four components: the eBPF hooks in the kernel, the interfaces, the in-kernel eBPF infrastructure to execute eBPF bytecode and compile into native code and verify the code and finally the eBPF application itself, which can be written in a C like dialect and compiled into eBPF bytecode by LLVM and GCC.


This thesis aims to analyze and compare the idle governors in the current Kernel in specific situations. It also should provide insight in the C-State usage depending on the architecture. The data is acquired using specific Tracepoints within the Kernel, which can be recorded and parsed with the Kernel Tool perf. Furthermore, we explore the feasibility of a custom eBPF powered idle governor.

Betreuer:

Marco Liess - Hagen Pfeifer (Rohde & Schwarz)

Design and Implementation of a Memory Prefetching Mechanism on an FPGA Prototype

Stichworte:
VHDL, C Programming, Distributed Memory, Data Migration, Task Migration, Hardware Accelerator

Beschreibung

Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density make DRAM omnipresend in most computer architectures. However, DRAM accesses are rather slow and require a dedicated DRAM controller
that coordinates the read and write accesses to the DRAM as well as the refresh cycles. In order to reduce the DRAM access latency, memory prefetching is a common technique to access data prior to their actual usage. However, this requires sophisticated prediction algorithms in order to prefetch the right data at the right time.
The Goal of this thesis is to design and implement a DAM preloading mechanism in an existing FPGA based prototype platform and to evaluate the design appropriately.
Towards this goal, you'll complete the following tasks:
1. Understanding the existing Memory Access mechanism
2. VHDL implementation of the preloading functionalities
3. Write and execute small baremetal test programs
4. Analyse and discuss the performance results

Voraussetzungen

  • Good Knowledge about MPSoCs
  • Good VHDL skills
  • Good C programming skills
  • High motivation
  • Self-responsible workstyle

Kontakt

Oliver Lenke

o.lenke@tum.de

Betreuer:

Oliver Lenke

Implementation of a SmartNIC-based HW Accelerator for Algorand Relay Nodes to broadcast Blockchain Messages

Beschreibung

The Algorand protocol is an environmentally friendly Blockchain technology based on the Proof-of-Stake (POS) consensus mechanism. It represents a new platform for smart contracts trying to solve the blockchain trilemma consisting of scalability, decentralization and security. As part of the ACE-SUPPRA project (Security, Usability, Performance, and Privacy Research in Algorand) we are investigating ways to accelerate the forwarding and broadcasting of Algorand messages throughout the blockchain network with the help of SmartNIC-based HW accelerators to increase the achievable transmission throughput and decrease latencies as well as power consumption.

To this end, the goal of this master thesis is to develop an extension of an existing packet reception, forwarding and delivery SmartNIC design to detect and relay Algorand transaction, block proposal, voting and consensus messages to a given set of network peers. The implementation will require an Algorand message detection entity consisting of a modified packet header parser and a Match-Action-Table. Furthermore, a PCIe-based configuration module for communicating with an attached host PC will be necessary to receive updates on new TCP connections and the IP addresses of the current peer list. The design will also encompass a high priority and bulk broadcast queue for Algorand messages alongside a suitable egress scheduler as well as a message memory and broadcast module for the transmission to four connected peers. Finally, a Packetizer unit will have to be designed, assembling TCP/IP packets and Algorand messages out of multiple Ethernet frames after reception, and vice versa also splitting messages into individual Layer 2 frames prior to their transmission.

Towards this goal you will complete the following tasks:
•    Research existing methods for relaying and broadcasting blockchain messages
•    Implement the design on the NetFPGA-SUME or AMD Alveo U55C prototyping platform
•    Compare and evaluate the implementation with the SW-based Golang implementation of Algorand
•    Document your work in a written thesis report and present your work in a presentation

Voraussetzungen

To successfully complete this project, you should already have the following skills and experiences.
•    Project Laboratory IC-Design or equivalent course
•    Good knowledge about Verilog or VHDL
•    Xilinx Vivado Design Suite and Synopsys VCS / Mentor Graphics ModelSim (tools will be provided)
•    Self-motivated and structured work style

Kontakt

Interested? Questions? Do not hesitate to contact me!


Franz Biersack
Chair of Integrated Systems
Arcisstraße 21, 80333 Munich
Tel. +49 89 289 23869
franz.biersack@tum.de
www.ce.cit.tum.de/lis

Betreuer:

Franz Biersack

SmartNIC Enhancements for Network Node Resilience

Beschreibung

The Chair of Integrated Systems participates in the DFG Priority Program “Resilient Connected
Worlds” by the German Research Foundation (SPP 2378). Our goal is to investigate which resilience
functions, that conventionally are provisioned by the central compute resources of Internet
Networking or Compute Nodes, can meaningfully be migrated onto the Network Interface Card (NIC).
By inspecting packet streams at full line rate (10 – 40 Gbps) a set of resilience functions, such as
access shields against a known set of traffic flows or redundant flow processing for a selected and
configured number of flows, shall be offloaded from centralized compute resources and offered in a
more performant and energy-efficient manner. Flows are identified by their so-called 5-tuple
consisting of source-/destination IP addresses and transport protocol ports as well as the protocol
field of the IP packet header.
During the Bachelor/Master Thesis, you will develop VHDL code for realizing one or more of the
SmartNIC Resilience building blocks: 5 tuple address matching against a preconfigured set of
addresses, perform the packet duplication for delivery to different processor cores or threads,
investigate methods to flexibly perform the address match on the entire or a variable subsection of
the 5 tuple array.

Voraussetzungen

  • VHDL coding, synthesis and FPGA prototyping
  • Braodband communication or Internet Networking Technologies,
    in particular OSI Layer packet header formats
  • Digital circuit design

Kontakt

Marco Liess
Room N2139
Tel. 089 289 23873
marco.liess@tum.de

Betreuer:

Marco Liess

Parsimonious Semantic Segmentation Training Using Active Learning and Synthetic Data

Beschreibung

The goal of this thesis is to implement an augmentation pipeline for both runtime accuracy improvement and training time generalization. At training time the augmented examples add diversity to the dataset, while at runtime the augmentation injects more information in addition to the RGB color channels, to help the CNN detect semantic segmentation features. The thesis will also explore different loss formulas and loss learning to make training semantic segmentation easier with fewer labeled examples. Finally, the CNN will be pruned and quantized for faster execution, while the rest of the processing (pre, post) pipeline will be accelerated on GPU.

Voraussetzungen

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in Python and Tensorflow
  • Good knowledge of neural network training theory
  • Experience with convolutional neural networks for semantic segementation

The student is expected to be highly motivated.

Kontakt

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Betreuer:

Nael Yousef Abdullah Al-Fasfous

Neural Style Transfer for Synthetic Data

Beschreibung

Neural networks have become the state-of-the-art in solving a variety of computer-vision problems, often outperforming classical image processing algorithms by a large margin. These applications range from autonomous vehicles to complex control of robots. However, training neural networks presents some difficulties. First and foremost is the cost of human effort to label and collect suitable training data (number and critical situations) in production environments for training purposes. Synthetic training data is one potential solution to this challenge.

Voraussetzungen

In the context of this work, a neural network for the control of an automated production line should be trained using synthetic data. For this purpose the following milestones planned:

  • Developement of a 3D-model for the generation of training data.
  • Automatic synthesis of images and ground truth data to train the image processing algorithm.
  • Adaptation of the synthetic training data to the real world (style transfer)
  • Outperforming neural networks classically trained on limited amount of real data

The student is expected to be highly motivated.

Kontakt

Alexander Frickenstein
Email: alexander.frickenstein@tum.de

Betreuer:

Alexander Frickenstein, Nael Yousef Abdullah Al-Fasfous

Anomaly Detection and Active Learning for Semantic Segmentation Tasks

Beschreibung

Clean, labeled datasets are an invaluable asset to research and industry for training and deploying machine learning algorithms such as convolutional neural networks (CNN). Procuring such datasets involves data collection, sorting and labeling, all of which are typically done by humans. This expensive process is time consuming, costly and does not scale well, even when outsourced.

The field of anomaly detection and active learning aims to tackle these challenges. In active learning, a CNN can be trained on a small set of labeled data. Once deployed in a real-world scenario, an uncertainty or loss predictor can be implemented alongside the algorithm to predict which data would result in high loss for the model. These non-trivial examples can be collected actively during deployment and forwarded to humans or more complex algorithms to observe, label and retrain the deployed CNN on. In anomaly detection, a network can predict which samples represent outliers or interesting anomalies with respect to the rest of the dataset. This further helps humans clean and sort such examples accordingly.

The goal of this thesis is to implement an anomaly detector and an uncertainity head to a CNN-based semantic segmentation application. The implementation will be tested on a real-world industrial AI application.

Voraussetzungen

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in Python and Tensorflow
  • Good knowledge of neural network training theory
  • Experience with convolutional neural networks for semantic segementation

The student is expected to be highly motivated.

Kontakt

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Betreuer:

Nael Yousef Abdullah Al-Fasfous

Learning to Prune and Quantize Transformers

Beschreibung

Advances in the deep learning architectures for computer vision applications have lead to new neural architectures such as vision transformers. These differentiate themselves from typical convolutional neural network-based implementations by decoupling the process of feature aggregation and transformation. Excellent performance is achieved through self-attention and self-supervision.

In this master thesis, visual transformers will be implemented in the first step. Following verification of state-of-the-art results, the transformers will be compressed through quantization and pruning to minimize their computational complexity on the inference hardware.

Voraussetzungen

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in Python and Tensorflow
  • Good knowledge of neural networks, basic knowledge of transformers

The student is expected to be highly motivated and independent.

Kontakt

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Betreuer:

Nael Yousef Abdullah Al-Fasfous, Alexander Frickenstein

Sparse Lookup Tables with dynamic precision adaptation for image processing on FPGA

Beschreibung

In image processing, non-linear transfer functions, such as sigmoid- or logarithm-shaped functions, are being used for mapping the input into different domains. For dedicated FPGA implementation of general image processing pipelines, these transfer functions are usually implemented by LUTs (Lookup Tables). Although the LUT-based method is more concise than some approximate direct implementation, it consumes a lot of resources. To save FPGA resources, sparse LUTs can be used, but it is to be noticed that the matching accuracy is then approximated to a certain acceptable range.

To further reduce the resource consumption, while maintaining or even improving the output accuracy, we propose a dynamic loading mechanism. In order to make full use of the resources on the chip, instead of placing one sparse LUT on chip, two function-wise complemented memory blocks shall be implemented in the data path of the processing pipeline. One of the memory blocks shall be filled only with the data points that fit the local range of current input data stream. Another one works as a general ultra-sparse LUT to map the input data into the inaccurate global range. In summary, a permanent memory block of very sparse/inaccurate data points should be kept on FPGA, which is then complemented by a memory block of accurate data points which are dynamically swapped in and out from an off-chip memory (DRAM). Based on this proposal, we need to investigate a dynamic loading mechanism for that accurate memory block, such that the input will fall into the local range with rational high probability.

In this work, a prototype of a sparse LUT with a dynamic precision adaptation mechanism should be developed on FPGA. In this thesis, several questions should be answered:

• How does the architecture of the implementation look like?

• What memory configuration should be used?

• How to determine when to load new data for the accurate memory block?

• What is the trade-off between accuracy and resource consumption?

Betreuer:

Arne Kreddig

Runtime Reconfigurable Winograd-based FPGA Accelerator for CNN Inference

Beschreibung

Convolutional neural networks have proven their success in extracting features from images and producing predictions for different tasks such as classification, segmentation and object detection. However, the superior performance of modern deep neural networks can be mostly backtracked to high model complexity and extensive hardware requirements. In this research internship, the complexity of convolution is reduced by quantization and Winograd minimal filtering algorithms. The prediction quality is regulated using dynamic reconfigurable Winograd acceleration. 

Voraussetzungen

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in C/C++
  • Good knowledge of neural networks, particularly convolutional neural networks
  • VHDL/Verilog or OpenCL would be encouraged. 

The student is expected to be highly motivated and independent. By completing this project, you will be able to:

  • Understand the impact of quantization, Winograd convolution and task specific accuracy. 
  • Implementation of run-time reconfigurable Winograd Convolution on FPGA using OpenCL. 
  • Evaluate trade-offs between flexibility, prediction accuracy and resource consumption

 

Kontakt

Manoj Vemparala
Autonomous Driving
BMW AG

Email: Manoj-Rohit.Vemparala@bmw.de 

-----

Nael Fasfous

Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Betreuer:

Manoj Rohit Vemparala, Nael Yousef Abdullah Al-Fasfous