MSEI/MSCE Research Internships

Some of the offered MSEI/MSCE research internships may be offered as tasks that also can be carried out in the context of the Project Lab Integrated Systems. If this applies it is explicitly mentioned in the associated topic description.

Available Topics

Interested in an internship or a thesis? 
Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.

Automotive Ethernet Anomaly Detection for Burst of Packets - ZCU102 Implementation

Description

Context:

Future cars have a wide variety of sensors, such as cameras, LiDARs, and RADARs that generate a large amount of data. This data has to be sent via an intra-vehicular network (IVN) to further processing nodes, and, ultimately, actuators have to react to the sensor input. In between the processing steps, the intra-vehicular network has to ensure that all of the data and control signals reach their destination in time. Hence, next to a large amount of data, there are also strict timing constraints that the intra-vehicular network has to cope with. Therefore, the so-called time-sensitive networking (TSN) has been introduced. The functional safety of such networks plays an important role against the background of highly automated driving. Emerging errors have to be detected early and potential countermeasures have to be taken to keep the vehicle in a safe state. Therefore, highly sophisticated monitoring and diagnosis algorithms are a key requirement for future cars. (See Project EMDRIVE)  

Our approach for such diagnosis builds on non-intrusively monitoring the intra-vehicular network by snooping on data traffic at an interconnect in the car. An analysis of the traffic shall give information about anomalies that occur inside the network as symptoms of an error inside the electrical architecture.   FORSCHUNGSPRAXIS:   The substance of this work is to first work into an existing design of an anomaly detection module that monitors individual packets in a flow. Based on the already existing work, several extensions have to be implemented (Verilog/SystemVerilog) in the hardware design to support anomaly detection in a burst of packet transfer. Type of the faults and anomalies:

  1. Arrival time of the Burst 
  2. Timing in-between packets in a single Burst
  3. Number of packets in a single Burst  

The system should be capable of detecting these fault classes and sending an alert/raising a flag to the software about the detected anomaly. It can then later on inject these types of fault classes during demonstration upon request.  The design should be simulated and implemented on an FPGA (ZCU102 Zync Board).  

If you are interested, feel free to contact me! Please send your CV as well as a recent transcript.

Prerequisites

The primary skills that will be developed and needed during this project are the following:

  • Proficiency in Verilog/SystemVerilog for FPGA design.
  • Ability to design and implement hardware modules.
  • Experience with FPGA simulation tools (e.g., ModelSim).
  • A strong background in System-on-Chip design.
  • A good understanding of network protocols and their implementation on FPGA platforms

Contact

zafer.attal@tum.de

Supervisor:

Zafer Attal

Assigned Topics

Software Implementation on ZCU102 Zynq Board PS in Correlation to TAS Server

Description

Context:

Future cars have a wide variety of sensors, such as cameras, LiDARs, and RADARs that generate a large amount of data. This data has to be sent via an intra-vehicular network (IVN) to further processing nodes, and, ultimately, actuators have to react to the sensor input. In between the processing steps, the intra-vehicular network has to ensure that all of the data and control signals reach their destination in time. Hence, next to a large amount of data, there are also strict timing constraints that the intra-vehicular network has to cope with. Therefore, the so-called time-sensitive networking (TSN) has been introduced. The functional safety of such networks plays an important role against the background of highly automated driving. Emerging errors have to be detected early and potential countermeasures have to be taken to keep the vehicle in a safe state. Therefore, highly sophisticated monitoring and diagnosis algorithms are a key requirement for future cars. When an anomaly is detected, the TAS server (Tool developed by Infineon) is used to request trace information from MultiCore Debug Solution (MCDS), which is a hardware feature available for Aurix boards that are used for debugging and tracing core and bus activities (See Project EMDRIVE).

The Zynq board consists of two parts: Programmable Logic (PL) and Processing System (PS). In this part of the work, the PL will implement the Companion Box, which will continuously monitor the traffic over the Ethernet. The PS part handles the tasks related to the TAS server and MCDS configuration, which will require Linux installation on the Zynq board to implement the TAS server. When an anomaly is detected from the Companion Box, a flag is set so that the software can detect and work accordingly. Then, a set of configurations will be automatically defined by the TAS server and sent to the MCDS of the targeted Aurix board that generated the anomaly. Upon the new configuration, the TAS server will retrieve the traces from the MCDS. 

FORSCHUNGSPRAXIS:

The substance of this work is to implement the following tasks:

  1. Install Linux on the ZCU102 PS.
  2. Test functionality of Linux.
  3. Install TAS server on the Linux OS of ZCU102 board.
  4. Configure MCDS of the Aurix boards using the TAS server and retrieve traces.
  5. Establish a connection between SW and HW of ZCU102 board (Flag assertion, Memory access).
  6. Automate the process of MCDS configuration and Trace retrieval. 

If you are interested, feel free to contact me! Please send your CV as well as a recent transcript.

Prerequisites

The primary skills that will be developed and needed during this project are the following:

  • Proficiency in Verilog/SystemVerilog for FPGA design.
  • A solid understanding of Linux OS.
  • An understanding of HW/SW co-design.
  • A strong background in System-on-Chip design.
  • A good knowledge of Python and Shell scripting.

Contact

zafer.attal@tum.de

Supervisor:

Zafer Attal

Functional Chain on Aurix TC3x Boards Implementation - Optical Flow Detection

Description

The Aurix TC3x boards are used as ECUs emulators in a Car for in-vehicle network communication. These boards are used to represent this communication behavior, which will work as a benchmark for other network traffic monitors and fault detection modules.

To showcase the Aurix board's functional chain, an Optical Flow Detection algorithm is proposed, where the input is real-time video (Camera). At the same time, the output will be the processed video displayed on a screen or Aurix LCD.

The functional chain should be divided into 3 sub-functions (F1-F2-F3) that will represent the algorithm in which each Aurix board should implement a single function. The data transfer from one board to another uses an Ethernet switch, where the standard Ethernet protocols should be used for communication.  

This encompasses the following sub-tasks:

  • Bring up the Aurix boards, including the Aurix development environment.
  • Implement a functional chain consisting of (F1-F2-F3) that represents an Optical Flow Detection algorithm. 
  • Display the results on a screen or on an Aurix board LCD.
  • Establish Ethernet-based data exchange.

Prerequisites

  • Good knowledge of C programming
  • A solid understanding of System-on-Chip and the modules of general microcontroller

Supervisor:

Zafer Attal

IPF1 Demonstrator to visualize classifier generation using Genetic Algorithms for LCTs

Description

Reinforcement learning (RL) has been widely used for run-time management on multi-core processors. RL-based controllers can adapt to varying emerging workloads, system goals, constraints and environment changes by learning from their experiences.

Learning classifier tables(LCTs) are hardware based machine learning entities that are applied in our IPF project as low level controllers for DVFS. LCTs inherit the concept from learning classifier systems which is a rule based machine learning system.

In this work, you will

1. Develop a demonstrator to visualize classifier generation in LCTs.

2. Write low level embedded software in C to communicate between the Matlab GUI and software running on the FPGA.

3. Understand Implementation of LCTs on the Leon3 platform running on Virtrex7 FPGA.

 

Prerequisites

To successfully complete this project, you should already have the following skills and experiences: 
• Good VHDL, Matlab and C programming skills 
• Good understanding of MPSoCs
• Self-motivated and structured work style
• Knowledge of machine learning algorithms (LCS)

Contact

Anmol Surhonne

Technische Universität München
Department of Electrical and Computer Engineering
Chair of Integrated Systems
Arcisstr. 21
80290 München
Germany

Phone: +49.89.289.23872
Fax: +49.89.289.28323
Building: N1 (Theresienstr. 90)
Email: anmol.surhonne at tum dot de

 

 

Supervisor:

Anmol Prakash Surhonne

FPGA-based Network Tester for 100 Gbps

Description

With the advent of research on the next generation of
mobile communications 6G, we are engaged in exploring
architecture extensions for Smart Network Interface Cards
(SmartNICs). To enable adaptive, energy-efficient and
low-latency network interfaces, we are prototyping a
custom packet processing pipeline on FPGA-based NICs,
partially based on the open-nic project
(https://github.com/Xilinx/open-nic).

To test the performance of a SmartNIC-assisted server
under peak loads and achieve precise measurements of
key performance indicators (KPIs) such as throughput and latency, an FPGA-based Network Tester for 100 Gbps links shall be implemented and tested. For this, the Alveo U55C FPGA-based SmartNICs shall be used. With packet generation and throughput and latency measurements in hardware, maximum performance and precision should be reached.

The goal of this work is to implement the required logic modules in HDL (Verilog), integrate these modules into the OpenNIC Shell platform and test the design on the Alveo U55C FPGAs. Additionally, a software-interface to control the network tester can be developed, building up on a previous 10 Gbps Network Tester design. The design should also be evaluated regarding the performance of the packet generation as well as the precision in throughput and latency measurement.

Prerequisites

  • Programming skills VHDL/Verilog and C (and Python)
  • Good Knowledge of computer networks, OSI layer model and protocols
  • Comfortable with the Linux command line and bash
  • Preferably practical experience in FPGA design and implementation

Supervisor:

Marco Liess

Comparison of Safety Guarantee Mechansims for LCTs

Description

This thesis compares safety implementations for LCTs and decisively determines the superior one through simulations. The aim is to identify the safety mechanism with the best performance without violating any constraints.
To achieve this, different approaches (shielding, forbidden classifier) have to be implemented in MATLAB and good settings have to be found for each implementation.

The project is divided into two phases.

To accomplish our objective, we will implement different approaches, such as the Forbidden Classifier, Preemptive Shielding, and Post-posed Shield. We will compare these implementations with the archive we already have.   Throughout the implementation process, we will determine several properties, including how to build the forbidden classifier table, how to build the shield (i.e., what actions should be valid), and how to set the reward for the post-posed shield. Optimizing performance may require fine-tuning.

After completing the simulation phase in Matlab, we will make a decision on whether to implement the approach in hardware or test it in our Duckietown environment.

Contact

flo.maurer@tum.de

Supervisor:

Hardware Validation Intern

Description

Research Internship at Apple

Supervisor:

Anmol Prakash Surhonne - Sven Engleitner (Apple)

Duckietown - Image Processing on FPGAs

Description

At LIS we want to use the Duckietown hardware and software ecosystem for experimenting with our reinforcement learning based learning classifier tables (LCT) as part of the control system of the Duckiebots: https://www.ce.cit.tum.de/lis/forschung/aktuelle-projekte/duckietown-lab/

More information on Duckietown can be found on https://www.duckietown.org/.

In this student work, we want to enable the use of the FPGA in the Lane Detection.
Previous work already experimented with the communication between NVIDIA Jetson and the FPGA via a DMA.

Goal of this work is to port the LSD to FPGA to benefit from offloading parts of the Lane Detection Alogithm from the CPU and execute them accelerated on the FPGA.
At the end, there should be a seamless integration in the Lane Following Pipeline.

Prerequisites

  • Knowledge about Image Processing
  • Lots of FPGA experience
  • VHDL
  • Python

Contact

flo.maurer@tum.de
michael.meidinger@tum.de

Supervisor:

Florian Maurer, Michael Meidinger

Design and Implementation of a Stride Prefetching Mechanism in SystemC

Description

Since DRAM typically come with much higher access latencies than SRAM, many approaches to reduce DRAM latencies have already been explored, such as Caching, Access predictors, Row-buffers etc.

In the CeCaS research project, we plan to employ an additional mechanism, in detail a preloading mechanism of a certain fraction of the DRAM content to a small on-chip SRAM buffer. Thus, it is required to predict potentially next-accessed Cachelines, preload them to the SRAM and answer subsequent memory requests of this data from the SRAM instead forwarding them to the DRAM itself.

This functionality should be implemented as a TLM/SystemC model using Synopsys Platform Architect. A baseline system will bw provided, the goal is to implement this functionality in its simplest form as a baseline. Depending on the progress, this can be extended or refined in subsequent steps.

A close supervision, especially during the inital phase, will be guaranteed. Nevertheless, some experience with TLM modelling (e.g. SystemC Lab of LIS) or C++ programming is required.

 

Prerequisites

  • Experience with TLM modelling (e.g. SystemC Lab of LIS)
  • B.Sc. in Electrical Engineering or similar

 

Contact

Oliver Lenke

o.lenke@tum.de

Supervisor:

Oliver Lenke

Function Chain on Aurix TC3x Boards

Description

The diagnosis companion box (DCB) to be invesIgated in the EMDRIVE project supports the diagnosis of sporadically occurring systemaIc errors in in-vehicular networks, which have not been detected at design Ime. To idenIfy such issues and analyze/correlate them with potenIal root causes at system runIme, the DCB conInuously monitors traffic flows on the in-vehicular network (IVN) for deviaIons from the expected behavior and performs an iniIal analysis of potenIal root causes by inspecIng the processor traces of the source of the abnormal behavior.

To showcase the funcIonality of the DCB the demonstraIon scenario as depicted in the figure below is planned. The basis for the demonstraIon is a funcIonal chain of subfunc Ions F1, F2 and F3, that together make up an automoIve funcIon that is fed by a sensor and produces output for an actor. The sub-funcIons are mapped on different ECUs that interchange data among each other via Ethernet. The ECUs are represented by Aurix TFT Boards. An Ethernet switch, which has mirroring funcIonality, allows forwarding traffic that is exchanged between the regular interfaces to a specific output. This mirroring port feeds one of the inputs of the ZCU 102, which acts as the DCB. A TAS server running on the processor cores of the ZCU102 complements the setup. It allows configuring the MCDS on the Aurix boards on request of the control enIty of the DCB and fetches the traces captured from the Aurix boards to the DCB for online analysis.

The demonstraIon setup further provides the opIon to introduce arIficial errors in the processing of sub-funcIons F1 or/and F2, which lead to an anomaly in the Ethernet communicaIon to the subsequent sub-funcIon F2 or/and F3. This should be detectable by the DCB, which – depending on the concrete communicaIon anomaly – would configure the respecIve Aurix controller with an appropriate MCDS configuraIon. The trace data should then be delivered to the second port of the ZCU 102.

The task of the planned research internship is to establish the example applicaIon that makes up the funcIonal chain to be executed on the interconnected Auris boards. This encompasses the following sub-tasks:

  • Bring-Up of the Aurix boards including the Aurix development environment.
  • Determine an appropriate funcIonal chain that exchanges periodical traffic among its sub-funcIons and get them running on the Aurix boards.
  • Establish Ethernet based data exchange.
  • Establish measures to arIficially induce a disturbance of the processing within the Aurix cores so that the sub-funcIons produce an anomaly in their data exchange. (Current working hypothesis would be that the periodicity of the traffic is changed.)

Supervisor:

Thomas Wild, Zafer Attal

Implement a Neural Network based DVFS controller for runtime SoC performance-power optimization

Keywords:
Neural Networks, DVFS, Machine learning,

Description

Reinforcement learning (RL) has been widely used for run-time management on multi-core processors. RL-based controllers can adapt to varying emerging workloads, system goals, constraints and environment changes by learning from their experiences.

Neural Networks are a set of ML methods which are inspired by the human brain, mimicking the way that biological neurons signal to one another.

In this work, you will

1. Understand the working of Neural Networks. Implement a neural network in C.


2. Understand the architecture of the Leon3 based SoC.

3. Use neural networks to learn and control the processor voltage and frequency in runtime to optimize performance and power.

4. Design, test and implement the work on Xilinx FPGA

 

Prerequisites

To successfully complete this project, you should already have the following skills and experiences: 
• Good VHDL and C programming skills 
• Good understanding of MPSoCs
• Self-motivated and structured work style
• Knowledge of machine learning algorithms

 

Contact

Anmol Surhonne

Technische Universität München
Department of Electrical and Computer Engineering
Chair of Integrated Systems
Arcisstr. 21
80290 München
Germany

Phone: +49.89.289.23872
Fax: +49.89.289.28323
Building: N1 (Theresienstr. 90)
Room: N2137
Email: anmol.surhonne at tum dot de

 

 

Supervisor:

Anmol Prakash Surhonne

Investigation and Implementation of Approximate Comparators for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Investigation and Implementation of Approximate Dividers for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Investigation and Implementation of Approximate Dividers for FPGA

Description

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. One branch in this research field focuses on the approximation of arithmetic units, such as adders, subtractors, multipliers, and dividers. In this research internship, approximate dividers suitable for implementation on FPGA should be investigated.

The research internship starts with a literature research about state-of-the-art approximate dividers. Relevant literature must be searched and surveyed. Afterwards, the most promising approximate divider designs have to be selected based on the literature research. These designs must then be implemented in VHDL targeted for an FPGA design. Finally, a rudimentary evaluation of the implemented dividers has to be performed.

Prerequisites

The student should have the following skills in order to successfully complete the research internship:     

  • Good ability to understand technical and scientific literature (e.g IEEE or ACM papers)     
  • Analytical thinking     
  • Good programming skills in VHDL     
  • The ability to work independently     
  • High motivation
  • Previous experience with approximate computing is helpful, but not essentially required.

The student can work on the research internship remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate at LIS, TUM 
FPGA Design Engineer at SmartRay GmbH

arne.kreddig@smartray.com

Supervisor:

Arne Kreddig

Introspective Failure Prediction Algorithms on FPGAs

Description

Failure cases in autonomous driving are important to collect and investigate to improve the performance  of the system before large-scale deployment. Additionally, disengaging the autonomous driving system before the failure takes place can allow the human to take over in good time and maintain safety in uncertain situations. It is important to accelerate these algorithms on low-power, low-latency hardware, as they must run alongside the more compute intensive autonomous driving stack.

In this research internship, an image-classification convolutional neural network will be trained on a failure predicition dataset, then deployed on a dataflow based accelerator. The accelerator will be optimized for speed and efficiency, and HW-error cases will be investigated.

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good prgramming skills in Python and Pytorch
  • Basic programming skills in HDL/HLS
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent. By completing this project, you will be able to:

  • Optimize CNNs and their target hardware accelerator to improve overall system performance
  • Test and evaluate solutions for correctness and applicability
  • Present your work in the form of a scientific report

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

This project is in cooperation with BMW AG.

Supervisor:

Nael Yousef Abdullah Al-Fasfous

Accelerating Object-Detection Algorithms on NVDLA

Description

Convolutional neural networks (CNNs) are the state of the art for most computer vision tasks. Although their accuracy is unrivaled when compared to classical segmentation and classification algorithms, they present many challenges for implementation on hardware platforms. Most performant CNNs tend to be computationally complex for low-power embedded applications. Finding a good trade-off between accuracy and efficiency can be critical when deciding the network architecture and the target hardware.

This work focuses on acclerating CNNs for object-detection on the NVIDIA Deep Learning Accelerator (NVDLA). Different CNNs can be benchmarked, new layers must be added, and execution must be optimized to maintain minimum latency.

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good prgramming skills in C++, Python and Tensorflow
  • Good programming skills in HDL
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent. By completing this project, you will be able to:

  • Implement object-detection CNNs on a state-of-the-art accelerator
  • Optimizing CNNs through quantization and pruning to improve overall system performance
  • Test and evaluate solutions for correctness and applicability
  • Present your work in the form of a scientific report

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems
Arcisstr. 21
80333 Munich
Germany

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

This project is in cooperation with BMW AG.

Supervisor:

Nael Yousef Abdullah Al-Fasfous

Implementation of an Approximated FIR Filter on FPGA for Laser Line Extraction from Pixel Data

Description

Current 3D laser line scanners have precision in the range of a micrometer. These scanners work on the principle of laser triangulation and use a camera chip in the receive path. The captured pixel data is then processed on an FPGA to generate 3D profile data. In order to do this, the lsaser line, as seen by the camera, must be extracted from the pixel data. For this task, several methods have been proposed. One of these methods employs an FIR filter to calculate the derivative of the incoming pixel stream orthogonally to the laser line direction. Afterwards, the zero crossing of this derivative is detected. The position of the zero crossing marks the position of the laser line in the camera image. From this position, the distance of the laser scanner to the scanned object can be derived.

Approximate computing is an emerging design paradigm that trades in accuracy for resource consumption, i.e. a certain inaccuracy of the calculations is allowed with the goal of reducing the overall resource consumption of the implemented design. In this thesis, such approximation methods should be integrated to the data processing pipeline and the results should be evaluated.

This thesis includes the implementation of a simple data processing pipeline for the extraction of the laser line from pixel data using an FIR filter-based approach. The implementation should be done in VHDL. Furthermore, the necessity for prefiltering (e.g. smoothing) of the pixel data should be assessed and implemented if necessary. Finally, the potential for the integration of approximate computing methods into the data processing pipeline should be evaluated.

Prerequisites

The student should have the following skills in order to successfully complete the thesis.

  • Good programming skills in VHDL
  • A basic understanding of FIR filter design
  • A basic understanding of image processing
  • The ability to work independently
  • Previous experience with approximate computing is helpful, but not essentially required

The student can work on the thesis remotely from his home office.

Contact

Arne Kreddig
Doctoral Candidate and FPGA Design Engineer
SmartRay GmbH


arne.kreddig@smartray.com

Supervisor:

Arne Kreddig - (SmartRay GmbH)

Graph Neural Network-based Pruning

Description

Convolutional neural networks (CNNs) are the defacto standard for many computer vision (CV) applications. These range from medical technology, robotics applications to autonomous driving. However, most modern CNNs are very memory and compute intensive, particularly when they are dimensioned for complex CV problems.


Compressing neural networks is essential for a variety of real-world applications. Pruning is a widely used technique for reducing the complexity of a neural network by removing redundant and superfluous parameters. One characteristic of this approach is the pruning granularity, which describes the substructures that should be removed from the neural network. Another aspect is the method for finding the redundant and unused structures, which plays a central role in effective pruning without loss of task-related accuracy. The optimization goal determines which elements (kernel, filter, channel) can be removed from the topology of the CNN.

The goal of this work is to learn the internal relationships between the channels, filters, kernels of the layers by means of a graph neural network, and identify their relevance to the classification task of the CNN. The learned relationships are then used for pruning the neural network.

Prerequisites

Prerequisites

To successfully complete this project, you should have the following skills and experiences:

  • Good programming skills in Python and Tensorflow
  • Good knowledge of neural networks, particularly convolutional neural networks

The student is expected to be highly motivated and independent.

Contact

Nael Fasfous
Department of Electrical and Computer Engineering
Chair of Integrated Systems

Phone: +49.89.289.23858
Building: N1 (Theresienstr. 90)
Room: N2116
Email: nael.fasfous@tum.de

Supervisor:

Alexander Frickenstein, Nael Yousef Abdullah Al-Fasfous