Here you can find all available student positions of our chair. We offer master theses, bachelor theses, research internships, industry internships and interdisciplinary projects. If you cannot find a suitable offering, please contact one of our research members. To find more information about the research topics of our chair you can visit Research. Furthermore, we offer seminar topics.

 

Bachelor's Theses

Acceleration of Artificial Netlist Generation

Description

Data-driven Methods are the dominant modeling approaches nowadays. Machine Learning approaches, like graph neural networks, are applied to classic EDA problems (e.g. power modeling [1]). To ensure transferability between different circuit designs, the models have to be trained on diverse datasets. This includes various circuit designs showing cifferent characteristical corners for measures, like timing or power dissipation. 

The obstacle for academic research here is the lack of freely available circuits. There exist online collections, like OpenCores [2]. But it is questionable, if they support various design corners that make models robust. Here, the generation of artificial netlists can support. Frameworks for this target the automatic generation of random circuit designs with the only usecase to show realistic behavior to EDA tools. They do not have any other usable functionality. Although already used for classical EDA tools, artificial netlist generator (ANG) frameworks have been already developed especially for EDA targets [3].

As also large netlists need to be included in datasets, the performance of the ANG implementation itself need to be capable to generate netlists with large cell counts. The focus of this project should be to identify the time-consuming steps of an existing ANG implementation. Based on this analysis, the implementation should be modified for an acceleration of the generation run.

References:

[1] ZHANG, Yanqing; REN, Haoxing; KHAILANY, Brucek. GRANNITE: Graph neural network inference for transferable power estimation. In: 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020. S. 1-6.

[2] https://opencores.org/

[3] KIM, Daeyeon, et al. Construction of realistic place-and-route benchmarks for machine learning applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 42. Jg., Nr. 6, S. 2030-2042.

Prerequisites

- interest in software development for electronic circuit design automation

- solid knowledge of digital circuit design

- very profound knowledge of C++

- ability to work independent

Contact

If you are interested in this topic, send me your application at:

philipp.fengler@tum.de

Supervisor:

Philipp Fengler

Master's Theses

Fine-grained Exploration and Optimization of Deployment Parameters for Efficient Execution of Machine Learning Tasks on Microcontrollers

Description

Motivation

HW/SW Codesign, a technique that has been around for several decades, allows hardware designers
to take the target application into consideration and further enables software engineers to start
developing and testing firmware before actual hardware becomes available. This can drastically
reduce the time-to-market for new products and also comes with lower error rates compared to
conventional development cycles. Virtual prototyping is an important component in the typical
HW/SW-Codesign flow as software can be simulated at several abstraction layers (RTL-Level,
Instruction Level, Functional-Level) at early development stages, not only to find potential
hardware/software bugs but also to gain importation information regarding the expected
performance and efficiency metrics such as Runtime/Latency/Utilization.
Due to the increasing relevance of machine learning applications in our everyday lives, the co-
design and co-optimization on the hardware and models (HW/Model-Codesign) became more
popular, hence instead of the C/C++ code to be executed on the target device, the model
architecture and training aspects are aligned with the to be designed hardware or vice-versa.
However, due to the high complexity of nowadays machine learning frameworks and software
compilers, the deployment-related parameters also play a bigger role, which should be investigated
and exploited in the thesis.

Technical Background
The Embedded System Level (ESL) group at the EDA chair has a deep background in virtual
prototyping techniques. Recently, embedded machine learning became a highly exciting field of
research.
We are working primarily in an open-source software ecosystem. ETISS[1] is the instruction set
simulator that allows us to evaluate various embedded applications for different ISAs (nowadays
mainly RISC-V). Apache TVM[2] has been our ML deployment framework of choice for several
years now, especially due to its MicroTVM subproject. Our TinyML deployment and benchmarking
framework MLonMCU[3] is a powerful tool that enables us to evaluate different configurations of
tools fully automatically. However the actual candidates for evaluation need to be chosen manually,
which can lead to suboptimal results.

Task Description
In this project, an automated exploration for deployment parameters should be established. Further,
state-of-the-art optimization techniques must be utilized to find optimal sets of parameters in the
hyper-dimensional search space in an acceptable amount of time (no exhaustive search feasible).
The optimization should take multiple deployment metrics (for example, total runtime or memory
footprint) into account, yielding to a multi-objective optimization flow.
The to-be-implemented algorithms should build up on the existing tooling for prototyping and
benchmarking TinyML models developed at the EDA chair (ETISS & MLonMCU). If available,
existing libraries/packages for (hyper-parameter) optimization (for example, Optuna[4] or
XGBoost[5]) can be utilized.

First, a customizable objective function is required, which can be calculated, for example, based on
the weighted sum of relevant metrics determined using MLonMCU and should be later integrated
into the optimization algorithms.
To keep the complexity of the task low, the considered hardware and machine learning models can
be assumed as fixed. The workloads are further provided as already trained (and compressed)
models. The focus will thereby be solely on the deployment aspects of the machine learning
applications, which are mostly defined by the used machine learning and software compilers. The
search space grows in size heavily depending on the number of considered free variables, which can
be of different types (for example, categorical, discrete, sequential,…). Some examples are:
- Used data/kernel layout for convolution operations (NCHW, NHWC, HWIO, OHWI,…)
- Choice of kernel implementation (trivial, fallback, tuned, 3rd party kernel library, external,
accelerator,…)
- Compiler Flags (-O3/-Os/…, -fno-unroll,…)
It might turn out that some decisions might be helpful for some layers, while others would profit
from slightly or heavily different sets of parameters. Therefore, it should be possible to perform the
exploration on a per-layer fashion, which could yield even better results.
The optimization and exploration flow shall be visualized (for example Pareto plots) for the user
and executed in an efficient way to make sure of the available resources on our compute servers
(utilizing parallel processing and remote-execution features provided by MLonMCU)

Work Outline
1. Literature research
2. Setup toolset (MLonMCU → ETISS + TVM + muRISCV-NN)
3. Describe customizable objective/score functions for the optimization algorithm
4. Define search space(s) for deployment-parameter exploration
5. Develop automated exploration and optimization flow around the MLonMCU tool which
can take a batch of parameters and return the metrics used as inputs of the objective function
6. Investigate the potential of fine-grained (per-layer) optimization compared to a holistic (end-
to-end) approach
7. Optional: Introduce constraints (for example ROM footprint <1MB) to remove illegal
candidates from the search space (and potentially skip the time-consuming execution of
candidates)
8. Optional: Allow fast-estimation of deployment metrics by training a cost-model based on
the previous experiments.

References
[1] Mueller-Gritschneder, D., Devarajegowda, K., Dittrich, M., Ecker, W., Greim, M., & Schlichtmann, U. (2017, October). The
extendable translating instruction set simulator (ETISS) interlinked with an MDA framework for fast RISC prototyping. In
Proceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype
(pp. 79-84). GitHub: https://github.com/tum-ei-eda/etiss
[2] Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., ... & Krishnamurthy, A. (2018). TVM: end-to-end optimization
stack for deep learning. arXiv preprint arXiv:1802.04799, 11(2018), 20. GitHub: https://github.com/apache/tvm
[3] van Kempen, P., Stahl, R., Mueller-Gritschneder, D., & Schlichtmann, U. (2023, September). MLonMCU: TinyML
Benchmarking with Fast Retargeting. In Proceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI (pp.
32-36). GitHub: https://github.com/tum-ei-eda/mlonmcu
[4] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization
framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-
2631). GitHub: https://github.com/optuna/optuna
[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining (pp. 785-794). GitHub: https://github.com/dmlc/xgboost

Contact

Philipp van Kempen

Supervisor:

Philipp van Kempen

Hardware-based Memory Safety in RISC-V

Description

Memory safety bugs, e.g., buffer-over?ows or use-after-free, remain in the top ranks of security vulnerabilities. New hardware extensions such as the ARM Memory Tagging Extension help as mitigation, but are not yet available for all architectures. In this work, you will analyze and compare different methods to implement hardware-based memory safety approaches and their advantages/disadvantages. You will then implement hardware support for memory safety on RISC-V hardware.The work done in this thesis is part of the Chip Design Center Bayern Innovative that helps build an independent Chip Design infrastructure in Bavaria. In this project the Fraunhofer AISEC helps to develop secure RISC-V hardware and encourages publication of the ?nal results.

Prerequisites

The following list of prerequisites is not complete, but shall give you an idea what is expected.

  • Experience in a hardware description language like VHDL/Verilog
  • Basic knowledge of computer architectures and embedded systems programming
  • Basic knowledge in C/C++ to use our instrumentation and evaluation framework

Contact

Please apply to:


Fraunhofer AISEC
Lichtenbergstraße 11
85748 München
Konrad Hohentanner
or via email: konrad.hohentanner@aisec.fraunhofer.de

Please attach your current grade report and CV to your application.

 

Supervisor:

Johannes Geier - konrad.hohentanner@aisec.fraunhofer.de (Fraunhofer AISEC)

Algorithm-based Error Detection for Hardware-Accelerated ANNs

Description

Artificial Neural Networks (ANNs) are being deployed increasingly in safety-critical scenes, e.g., automotive systems and their platforms. Various fault tolerance/detection methods can be adopted to ensure the computation of the ML networks' inferences is reliable. A state-of-the-art solution is redundancy, where a computation is made multiple times, and their respective results are compared. This can be achieved sequentially or concurrently, e.g., through lock-stepped processors. However, this redundancy method introduces a significant overhead to the system: The required multiplicity of computational demand - execution time or processing nodes. To mitigate this overhead, several Algorithm-based Error Detection (ABED) approaches can be taken; among these, the following should be considered in this work:

  1. Selective Hardening: Only the most vulnerable parts (layers) are duplicated.
  2. Checksums: Redundancy for linear operations can be achieved with checksums. It aims to mitigate the overhead by introducing redundancy into the algorithms, e.g., filter and input checksums for convolutions [1] and fully connected (dense) layers [2].

The goals of this project are:

  • Integrate an existing ABED-enhanced ML compiler for an industrial ANN deployment flow,
  • design an experimental evaluation to test the performance impacts of 1. and 2. for an industry HW/SW setup, and
  • conduct statistical fault injection experiments [3] to measure error mitigation of 1. and 2.

Related Work:
[1] S. K. S. Hari, M. B. Sullivan, T. Tsai, and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[2] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[3] R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert, "Statistical fault injection: Quantified error and confidence," 2009 Design, Automation & Test in Europe Conference & Exhibition, Nice, France, 2009, pp. 502-506, doi: 10.1109/DATE.2009.5090716.

 

 

 

Prerequisites

  • Good understanding of Data Flow Graphs, Scheduling, etc.
  • Good understanding of ANNs
  • Good knowledge of Linux, (embedded) C/C++, Python
  • Basic understanding of Compilers, preferably TVM and LLVM

This work will be conducted in cooperation with Infineon, Munich.

 

 

Contact

Please apply 

johannes.geier@tum.de

Please attach your current transcript of records (grade report) and CV to your application.

Supervisor:

Johannes Geier

Cadence Internship position for AI ML assisted Functional Verification

Description

siehe pdf

Contact

marion@cadence.com

Supervisor:

Ulf Schlichtmann - (Cadence)

Integration of Deep Learning Backends Using Collage

Short Description:
The thesis will contribute to the research on Collage for integration of Deep Learning (DL) backends and provide insights into the challenges in this field.

Description

The thesis will contribute to the research on Collage for integration of Deep Learning (DL) backends and provide insights into the challenges in this field. The strong demand for efficient and performant deployment of DL applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. However, current DL frameworks require significant manual effort and expertise to integrate every new backend while failing to unleash its full potential. Given the fast-evolving nature of the DL ecosystem, this manual approach often slows down continuous innovations across different layers; it prevents hardware vendors from the fast deployment of their cutting-edge libraries, DL framework developers must repeatedly adjust their hand-coded rules to accommodate new versions of libraries, and machine learning practitioners need to wait for the integration of new technologies and often encounter unsatisfactory performance. Collage is a DL framework that offers seamless integration of DL backends. Collage provides an expressive backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage automatically searches for an optimized backend placement strategy for a given workload and execution environment.

Your work:

  1. Conduct a comprehensive literature review on Collage and similar frameworks
  2. Conduct an experiment on Collage including a heterogenous system including UMA-integrated backends in TVM 

Prerequisites

  • Fundamental understanding of neural networks and embedded systems
  • Basic understanding of TVM compiler
  • Experience in programming C\C++ and Python
  • Self-motivation and ability to work independently 

Contact

If you are interested in this topic, please contact me at samira.ahmadifarsani@tum.de.

Supervisor:

Samira Ahmadifarsani

Memory-Oriented Approaches for Deployment of DNNs on Low-Cost Edge Heterogeneous Systems

Short Description:
The thesis will contribute to the research on memory-centric approaches for the deployment of DNN models on resource-constraint heterogeneous systems and provide insights into the existing challenges in this field.

Description

The thesis will contribute to the research on memory-centric approaches for the deployment of DNN models on resource-constraint heterogeneous systems and provide insights into the existing challenges in this field. In recent years, the rapid growth of Artificial Intelligence (AI) and the explosion of hardware devices with AI-specific features have led to a rising demand for tools and frameworks capable of translating Deep Learning models from high-level languages like Python into lower-level code optimized for a particular hardware target, often in C. This thesis focuses on edge heterogeneous systems with limited computational capabilities and low memory and prioritizes energy efficiency. The proliferation of diverse hardware platforms and programming ecosystems makes porting AI models to every device a non-trivial task. An ideal solution would be a universal tool that can translate high-level model representations, e.g., in Python, into low-level code while accommodating various hardware constraints, programming languages, and interfaces. Unfortunately, achieving this goal without compromising performance is still a challenge. For example, the TVM compiler stack is a popular open-source toolchain for deploying networks on many devices, including CPUs, GPUs, or ARM and RISC-V-based Microcontrollers (MCUs) but falls short when generating code for heterogeneous Systems-on-Chip (SoCs) containing different accelerators. Recent efforts have focused on integrating TVM with memory-oriented deployment frameworks like DORY [1] and ZigZag [2], aiming to address these challenges.

[1] Van Delm, et al. "HTVM: Efficient neural network deployment on heterogeneous TinyML platforms." In 2023 60th ACM/IEEE Design Automation Conference (DAC), pp. 1-6. IEEE, 2023.

[2] Hamdi, Mohamed Amine. "Integrating Design Space Exploration in Modern Compilation Toolchains for Deep Learning." PhD diss., Politecnico di Torino, 2023.

Your work:
1. Conduct a comprehensive literature review of existing works.
2. Compare the references to identify gaps and unresolved challenges. 3. Investigate the integration flow of references 1 and 2 in TVM.
4. Work on integrating the approaches outlined in references 1 and 2 using the UMA framework within TVM.  

Prerequisites

Requirements:

  • Fundamental understanding of neural networks and embedded systems
  • Basic understanding of TVM compiler
  • Experience in programming C\C++ and Python
  • Self-motivation and ability to work independently 

Contact

If you are interested in this topic, please contact me at samira.ahmadifarsani@tum.de.

Supervisor:

Samira Ahmadifarsani

Interdisciplinary Projects

Vector Graphics Generation from XML Descriptions of Chip Modules

Description

Project Description

In this project, students will develop a Java program that reads an XML file describing various characteristics of a chip module and generates corresponding SVG graphics based on the provided descriptions. This project aims to enhance students’ understanding of XML parsing, SVG graphics creation, and Java programming. By the end of the project, students will have a functional tool that can visualize chip modules dynamically.

Objectives

  • To understand and implement XML parsing in Java.
  • To learn the basics of SVG graphics and how to generate them programmatically.
  • To develop a Java application that integrates XML data with SVG output.
  • To enhance problem-solving and programming skills in Java.

Prerequisites

Students should have the following skills and knowledge before starting this project:

  • Basic Java Programming: Understanding of Java syntax, object-oriented programming concepts, and basic data structures.
  • XML Basics: Familiarity with XML structure and how to read/write XML files.
  • SVG Fundamentals: Basic knowledge of SVG (Scalable Vector Graphics) and its elements.
  • Problem-Solving Skills: Ability to break down complex problems into manageable tasks and implement solutions.

Project Tasks

  1. XML Parsing: Write a Java program to read and parse the XML file containing chip module descriptions.
  2. SVG Generation: Develop methods to convert parsed XML data into SVG graphics.
  3. Integration: Combine XML parsing and SVG generation into a cohesive Java application.
  4. Testing and Debugging: Test the application with various XML files and debug any issues that arise.
  5. Documentation: Document the code and provide a user guide for the application.

Prerequisites

See above.

Contact

Yushen.Zhang+Project@cs.tum.edu

Supervisor:

Yushen Zhang

Fine-grained Exploration and Optimization of Deployment Parameters for Efficient Execution of Machine Learning Tasks on Microcontrollers

Description

Motivation

HW/SW Codesign, a technique that has been around for several decades, allows hardware designers
to take the target application into consideration and further enables software engineers to start
developing and testing firmware before actual hardware becomes available. This can drastically
reduce the time-to-market for new products and also comes with lower error rates compared to
conventional development cycles. Virtual prototyping is an important component in the typical
HW/SW-Codesign flow as software can be simulated at several abstraction layers (RTL-Level,
Instruction Level, Functional-Level) at early development stages, not only to find potential
hardware/software bugs but also to gain importation information regarding the expected
performance and efficiency metrics such as Runtime/Latency/Utilization.
Due to the increasing relevance of machine learning applications in our everyday lives, the co-
design and co-optimization on the hardware and models (HW/Model-Codesign) became more
popular, hence instead of the C/C++ code to be executed on the target device, the model
architecture and training aspects are aligned with the to be designed hardware or vice-versa.
However, due to the high complexity of nowadays machine learning frameworks and software
compilers, the deployment-related parameters also play a bigger role, which should be investigated
and exploited in the thesis.

Technical Background
The Embedded System Level (ESL) group at the EDA chair has a deep background in virtual
prototyping techniques. Recently, embedded machine learning became a highly exciting field of
research.
We are working primarily in an open-source software ecosystem. ETISS[1] is the instruction set
simulator that allows us to evaluate various embedded applications for different ISAs (nowadays
mainly RISC-V). Apache TVM[2] has been our ML deployment framework of choice for several
years now, especially due to its MicroTVM subproject. Our TinyML deployment and benchmarking
framework MLonMCU[3] is a powerful tool that enables us to evaluate different configurations of
tools fully automatically. However the actual candidates for evaluation need to be chosen manually,
which can lead to suboptimal results.

Task Description
In this project, an automated exploration for deployment parameters should be established. Further,
state-of-the-art optimization techniques must be utilized to find optimal sets of parameters in the
hyper-dimensional search space in an acceptable amount of time (no exhaustive search feasible).
The optimization should take multiple deployment metrics (for example, total runtime or memory
footprint) into account, yielding to a multi-objective optimization flow.
The to-be-implemented algorithms should build up on the existing tooling for prototyping and
benchmarking TinyML models developed at the EDA chair (ETISS & MLonMCU). If available,
existing libraries/packages for (hyper-parameter) optimization (for example, Optuna[4] or
XGBoost[5]) can be utilized.

First, a customizable objective function is required, which can be calculated, for example, based on
the weighted sum of relevant metrics determined using MLonMCU and should be later integrated
into the optimization algorithms.
To keep the complexity of the task low, the considered hardware and machine learning models can
be assumed as fixed. The workloads are further provided as already trained (and compressed)
models. The focus will thereby be solely on the deployment aspects of the machine learning
applications, which are mostly defined by the used machine learning and software compilers. The
search space grows in size heavily depending on the number of considered free variables, which can
be of different types (for example, categorical, discrete, sequential,…). Some examples are:
- Used data/kernel layout for convolution operations (NCHW, NHWC, HWIO, OHWI,…)
- Choice of kernel implementation (trivial, fallback, tuned, 3rd party kernel library, external,
accelerator,…)
- Compiler Flags (-O3/-Os/…, -fno-unroll,…)
It might turn out that some decisions might be helpful for some layers, while others would profit
from slightly or heavily different sets of parameters. Therefore, it should be possible to perform the
exploration on a per-layer fashion, which could yield even better results.
The optimization and exploration flow shall be visualized (for example Pareto plots) for the user
and executed in an efficient way to make sure of the available resources on our compute servers
(utilizing parallel processing and remote-execution features provided by MLonMCU)

Work Outline
1. Literature research
2. Setup toolset (MLonMCU → ETISS + TVM + muRISCV-NN)
3. Describe customizable objective/score functions for the optimization algorithm
4. Define search space(s) for deployment-parameter exploration
5. Develop automated exploration and optimization flow around the MLonMCU tool which
can take a batch of parameters and return the metrics used as inputs of the objective function
6. Investigate the potential of fine-grained (per-layer) optimization compared to a holistic (end-
to-end) approach
7. Optional: Introduce constraints (for example ROM footprint <1MB) to remove illegal
candidates from the search space (and potentially skip the time-consuming execution of
candidates)
8. Optional: Allow fast-estimation of deployment metrics by training a cost-model based on
the previous experiments.

References
[1] Mueller-Gritschneder, D., Devarajegowda, K., Dittrich, M., Ecker, W., Greim, M., & Schlichtmann, U. (2017, October). The
extendable translating instruction set simulator (ETISS) interlinked with an MDA framework for fast RISC prototyping. In
Proceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype
(pp. 79-84). GitHub: https://github.com/tum-ei-eda/etiss
[2] Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., ... & Krishnamurthy, A. (2018). TVM: end-to-end optimization
stack for deep learning. arXiv preprint arXiv:1802.04799, 11(2018), 20. GitHub: https://github.com/apache/tvm
[3] van Kempen, P., Stahl, R., Mueller-Gritschneder, D., & Schlichtmann, U. (2023, September). MLonMCU: TinyML
Benchmarking with Fast Retargeting. In Proceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI (pp.
32-36). GitHub: https://github.com/tum-ei-eda/mlonmcu
[4] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization
framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-
2631). GitHub: https://github.com/optuna/optuna
[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining (pp. 785-794). GitHub: https://github.com/dmlc/xgboost

Contact

Philipp van Kempen

Supervisor:

Philipp van Kempen

Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge

Keywords:
Microsystem, MEMS, Innovation, Creativity
Short Description:
This is a dynamic and hands-on internship designed to empower students to harness their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming an innovative entrepreneur in the realm of sensor and microsystem applications. At the end of this contest, you will be credited with the credits for FP/IP/IDP.

Description

Welcome to "Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge," a dynamic and hands-on internship designed to empower students in harnessing their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming innovative entrepreneurs in the realm of sensor and microsystem applications.

 

COSIMA is a German national student competition. 

 

Overview:

In this practical internship, students will delve into the world of microsystems, exploring their components, functionalities, and potential applications. The focus will be on fostering creativity and teamwork as students work collaboratively to conceive, design, and prototype innovative solutions using sensors and microsystems.

 

Key Features:

Creative Exploration: Unlike traditional courses and internships, this one offers the freedom to choose and define your own technical challenge. Students will be encouraged to think outside the box, identify real-world problems, and propose solutions that leverage microsystems to enhance human-technology interactions.

 

Hands-On Prototyping: The heart of the internship lies in turning ideas into reality. Students will actively engage in the prototyping process, developing functional prototypes of their innovative concepts. Emphasis will be placed on understanding the practical aspects of sensor integration, actuation, and control electronics.

 

COSIMA Contest Preparation: The internship will align with the COSIMA competition requirements, preparing students to present their prototypes on the competition day. Guidance will be provided on creating impactful presentations that showcase the ingenuity and practicality of their solutions.

 

Go International: The winners of COSIMA will qualify to take part in the international iCAN competition. Guidance and preparation for the iCAN will be provided.

 

Entrepreneurial Mindset: Drawing inspiration from successful startups that emerged from COSIMA, the internship will instill an entrepreneurial mindset. Students will learn about the essentials of founding a startup, from business planning to pitching their ideas.

 

Us in the past:

iCANX Wettbewerb 2024 (cosima-mems.de)

Das war COSIMA 2023 (cosima-mems.de)

iCAN Wettbewerb 2023 (cosima-mems.de)

Sieger COSIMA 2022 (cosima-mems.de)

Prerequisites

Intermediate German and English language proficiency is required.

Contact

Supervisor:

Yushen Zhang

Research Internships (Forschungspraxis)

Acceleration of Artificial Netlist Generation

Description

Data-driven Methods are the dominant modeling approaches nowadays. Machine Learning approaches, like graph neural networks, are applied to classic EDA problems (e.g. power modeling [1]). To ensure transferability between different circuit designs, the models have to be trained on diverse datasets. This includes various circuit designs showing cifferent characteristical corners for measures, like timing or power dissipation. 

The obstacle for academic research here is the lack of freely available circuits. There exist online collections, like OpenCores [2]. But it is questionable, if they support various design corners that make models robust. Here, the generation of artificial netlists can support. Frameworks for this target the automatic generation of random circuit designs with the only usecase to show realistic behavior to EDA tools. They do not have any other usable functionality. Although already used for classical EDA tools, artificial netlist generator (ANG) frameworks have been already developed especially for EDA targets [3].

As also large netlists need to be included in datasets, the performance of the ANG implementation itself need to be capable to generate netlists with large cell counts. The focus of this project should be to identify the time-consuming steps of an existing ANG implementation. Based on this analysis, the implementation should be modified for an acceleration of the generation run.

References:

[1] ZHANG, Yanqing; REN, Haoxing; KHAILANY, Brucek. GRANNITE: Graph neural network inference for transferable power estimation. In: 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020. S. 1-6.

[2] https://opencores.org/

[3] KIM, Daeyeon, et al. Construction of realistic place-and-route benchmarks for machine learning applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 42. Jg., Nr. 6, S. 2030-2042.

Prerequisites

- interest in software development for electronic circuit design automation

- solid knowledge of digital circuit design

- very profound knowledge of C++

- ability to work independent

Contact

If you are interested in this topic, send me your application at:

philipp.fengler@tum.de

Supervisor:

Philipp Fengler

Vector Graphics Generation from XML Descriptions of Chip Modules

Description

Project Description

In this project, students will develop a Java program that reads an XML file describing various characteristics of a chip module and generates corresponding SVG graphics based on the provided descriptions. This project aims to enhance students’ understanding of XML parsing, SVG graphics creation, and Java programming. By the end of the project, students will have a functional tool that can visualize chip modules dynamically.

Objectives

  • To understand and implement XML parsing in Java.
  • To learn the basics of SVG graphics and how to generate them programmatically.
  • To develop a Java application that integrates XML data with SVG output.
  • To enhance problem-solving and programming skills in Java.

Prerequisites

Students should have the following skills and knowledge before starting this project:

  • Basic Java Programming: Understanding of Java syntax, object-oriented programming concepts, and basic data structures.
  • XML Basics: Familiarity with XML structure and how to read/write XML files.
  • SVG Fundamentals: Basic knowledge of SVG (Scalable Vector Graphics) and its elements.
  • Problem-Solving Skills: Ability to break down complex problems into manageable tasks and implement solutions.

Project Tasks

  1. XML Parsing: Write a Java program to read and parse the XML file containing chip module descriptions.
  2. SVG Generation: Develop methods to convert parsed XML data into SVG graphics.
  3. Integration: Combine XML parsing and SVG generation into a cohesive Java application.
  4. Testing and Debugging: Test the application with various XML files and debug any issues that arise.
  5. Documentation: Document the code and provide a user guide for the application.

Prerequisites

See above.

Contact

Yushen.Zhang+Project@cs.tum.edu

Supervisor:

Yushen Zhang

Fine-grained Exploration and Optimization of Deployment Parameters for Efficient Execution of Machine Learning Tasks on Microcontrollers

Description

Motivation

HW/SW Codesign, a technique that has been around for several decades, allows hardware designers
to take the target application into consideration and further enables software engineers to start
developing and testing firmware before actual hardware becomes available. This can drastically
reduce the time-to-market for new products and also comes with lower error rates compared to
conventional development cycles. Virtual prototyping is an important component in the typical
HW/SW-Codesign flow as software can be simulated at several abstraction layers (RTL-Level,
Instruction Level, Functional-Level) at early development stages, not only to find potential
hardware/software bugs but also to gain importation information regarding the expected
performance and efficiency metrics such as Runtime/Latency/Utilization.
Due to the increasing relevance of machine learning applications in our everyday lives, the co-
design and co-optimization on the hardware and models (HW/Model-Codesign) became more
popular, hence instead of the C/C++ code to be executed on the target device, the model
architecture and training aspects are aligned with the to be designed hardware or vice-versa.
However, due to the high complexity of nowadays machine learning frameworks and software
compilers, the deployment-related parameters also play a bigger role, which should be investigated
and exploited in the thesis.

Technical Background
The Embedded System Level (ESL) group at the EDA chair has a deep background in virtual
prototyping techniques. Recently, embedded machine learning became a highly exciting field of
research.
We are working primarily in an open-source software ecosystem. ETISS[1] is the instruction set
simulator that allows us to evaluate various embedded applications for different ISAs (nowadays
mainly RISC-V). Apache TVM[2] has been our ML deployment framework of choice for several
years now, especially due to its MicroTVM subproject. Our TinyML deployment and benchmarking
framework MLonMCU[3] is a powerful tool that enables us to evaluate different configurations of
tools fully automatically. However the actual candidates for evaluation need to be chosen manually,
which can lead to suboptimal results.

Task Description
In this project, an automated exploration for deployment parameters should be established. Further,
state-of-the-art optimization techniques must be utilized to find optimal sets of parameters in the
hyper-dimensional search space in an acceptable amount of time (no exhaustive search feasible).
The optimization should take multiple deployment metrics (for example, total runtime or memory
footprint) into account, yielding to a multi-objective optimization flow.
The to-be-implemented algorithms should build up on the existing tooling for prototyping and
benchmarking TinyML models developed at the EDA chair (ETISS & MLonMCU). If available,
existing libraries/packages for (hyper-parameter) optimization (for example, Optuna[4] or
XGBoost[5]) can be utilized.

First, a customizable objective function is required, which can be calculated, for example, based on
the weighted sum of relevant metrics determined using MLonMCU and should be later integrated
into the optimization algorithms.
To keep the complexity of the task low, the considered hardware and machine learning models can
be assumed as fixed. The workloads are further provided as already trained (and compressed)
models. The focus will thereby be solely on the deployment aspects of the machine learning
applications, which are mostly defined by the used machine learning and software compilers. The
search space grows in size heavily depending on the number of considered free variables, which can
be of different types (for example, categorical, discrete, sequential,…). Some examples are:
- Used data/kernel layout for convolution operations (NCHW, NHWC, HWIO, OHWI,…)
- Choice of kernel implementation (trivial, fallback, tuned, 3rd party kernel library, external,
accelerator,…)
- Compiler Flags (-O3/-Os/…, -fno-unroll,…)
It might turn out that some decisions might be helpful for some layers, while others would profit
from slightly or heavily different sets of parameters. Therefore, it should be possible to perform the
exploration on a per-layer fashion, which could yield even better results.
The optimization and exploration flow shall be visualized (for example Pareto plots) for the user
and executed in an efficient way to make sure of the available resources on our compute servers
(utilizing parallel processing and remote-execution features provided by MLonMCU)

Work Outline
1. Literature research
2. Setup toolset (MLonMCU → ETISS + TVM + muRISCV-NN)
3. Describe customizable objective/score functions for the optimization algorithm
4. Define search space(s) for deployment-parameter exploration
5. Develop automated exploration and optimization flow around the MLonMCU tool which
can take a batch of parameters and return the metrics used as inputs of the objective function
6. Investigate the potential of fine-grained (per-layer) optimization compared to a holistic (end-
to-end) approach
7. Optional: Introduce constraints (for example ROM footprint <1MB) to remove illegal
candidates from the search space (and potentially skip the time-consuming execution of
candidates)
8. Optional: Allow fast-estimation of deployment metrics by training a cost-model based on
the previous experiments.

References
[1] Mueller-Gritschneder, D., Devarajegowda, K., Dittrich, M., Ecker, W., Greim, M., & Schlichtmann, U. (2017, October). The
extendable translating instruction set simulator (ETISS) interlinked with an MDA framework for fast RISC prototyping. In
Proceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype
(pp. 79-84). GitHub: https://github.com/tum-ei-eda/etiss
[2] Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., ... & Krishnamurthy, A. (2018). TVM: end-to-end optimization
stack for deep learning. arXiv preprint arXiv:1802.04799, 11(2018), 20. GitHub: https://github.com/apache/tvm
[3] van Kempen, P., Stahl, R., Mueller-Gritschneder, D., & Schlichtmann, U. (2023, September). MLonMCU: TinyML
Benchmarking with Fast Retargeting. In Proceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI (pp.
32-36). GitHub: https://github.com/tum-ei-eda/mlonmcu
[4] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization
framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-
2631). GitHub: https://github.com/optuna/optuna
[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining (pp. 785-794). GitHub: https://github.com/dmlc/xgboost

Contact

Philipp van Kempen

Supervisor:

Philipp van Kempen

Hardware-based Memory Safety in RISC-V

Description

Memory safety bugs, e.g., buffer-over?ows or use-after-free, remain in the top ranks of security vulnerabilities. New hardware extensions such as the ARM Memory Tagging Extension help as mitigation, but are not yet available for all architectures. In this work, you will analyze and compare different methods to implement hardware-based memory safety approaches and their advantages/disadvantages. You will then implement hardware support for memory safety on RISC-V hardware.The work done in this thesis is part of the Chip Design Center Bayern Innovative that helps build an independent Chip Design infrastructure in Bavaria. In this project the Fraunhofer AISEC helps to develop secure RISC-V hardware and encourages publication of the ?nal results.

Prerequisites

The following list of prerequisites is not complete, but shall give you an idea what is expected.

  • Experience in a hardware description language like VHDL/Verilog
  • Basic knowledge of computer architectures and embedded systems programming
  • Basic knowledge in C/C++ to use our instrumentation and evaluation framework

Contact

Please apply to:


Fraunhofer AISEC
Lichtenbergstraße 11
85748 München
Konrad Hohentanner
or via email: konrad.hohentanner@aisec.fraunhofer.de

Please attach your current grade report and CV to your application.

 

Supervisor:

Johannes Geier - konrad.hohentanner@aisec.fraunhofer.de (Fraunhofer AISEC)

Algorithm-based Error Detection for Hardware-Accelerated ANNs

Description

Artificial Neural Networks (ANNs) are being deployed increasingly in safety-critical scenes, e.g., automotive systems and their platforms. Various fault tolerance/detection methods can be adopted to ensure the computation of the ML networks' inferences is reliable. A state-of-the-art solution is redundancy, where a computation is made multiple times, and their respective results are compared. This can be achieved sequentially or concurrently, e.g., through lock-stepped processors. However, this redundancy method introduces a significant overhead to the system: The required multiplicity of computational demand - execution time or processing nodes. To mitigate this overhead, several Algorithm-based Error Detection (ABED) approaches can be taken; among these, the following should be considered in this work:

  1. Selective Hardening: Only the most vulnerable parts (layers) are duplicated.
  2. Checksums: Redundancy for linear operations can be achieved with checksums. It aims to mitigate the overhead by introducing redundancy into the algorithms, e.g., filter and input checksums for convolutions [1] and fully connected (dense) layers [2].

The goals of this project are:

  • Integrate an existing ABED-enhanced ML compiler for an industrial ANN deployment flow,
  • design an experimental evaluation to test the performance impacts of 1. and 2. for an industry HW/SW setup, and
  • conduct statistical fault injection experiments [3] to measure error mitigation of 1. and 2.

Related Work:
[1] S. K. S. Hari, M. B. Sullivan, T. Tsai, and S. W. Keckler, "Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2546-2558, 1 July-Aug. 2022, doi: 10.1109/TDSC.2021.3063083.
[2] Kuang-Hua Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," in IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, June 1984, doi: 10.1109/TC.1984.1676475.
[3] R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert, "Statistical fault injection: Quantified error and confidence," 2009 Design, Automation & Test in Europe Conference & Exhibition, Nice, France, 2009, pp. 502-506, doi: 10.1109/DATE.2009.5090716.

 

 

 

Prerequisites

  • Good understanding of Data Flow Graphs, Scheduling, etc.
  • Good understanding of ANNs
  • Good knowledge of Linux, (embedded) C/C++, Python
  • Basic understanding of Compilers, preferably TVM and LLVM

This work will be conducted in cooperation with Infineon, Munich.

 

 

Contact

Please apply 

johannes.geier@tum.de

Please attach your current transcript of records (grade report) and CV to your application.

Supervisor:

Johannes Geier

Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge

Keywords:
Microsystem, MEMS, Innovation, Creativity
Short Description:
This is a dynamic and hands-on internship designed to empower students to harness their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming an innovative entrepreneur in the realm of sensor and microsystem applications. At the end of this contest, you will be credited with the credits for FP/IP/IDP.

Description

Welcome to "Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge," a dynamic and hands-on internship designed to empower students in harnessing their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming innovative entrepreneurs in the realm of sensor and microsystem applications.

 

COSIMA is a German national student competition. 

 

Overview:

In this practical internship, students will delve into the world of microsystems, exploring their components, functionalities, and potential applications. The focus will be on fostering creativity and teamwork as students work collaboratively to conceive, design, and prototype innovative solutions using sensors and microsystems.

 

Key Features:

Creative Exploration: Unlike traditional courses and internships, this one offers the freedom to choose and define your own technical challenge. Students will be encouraged to think outside the box, identify real-world problems, and propose solutions that leverage microsystems to enhance human-technology interactions.

 

Hands-On Prototyping: The heart of the internship lies in turning ideas into reality. Students will actively engage in the prototyping process, developing functional prototypes of their innovative concepts. Emphasis will be placed on understanding the practical aspects of sensor integration, actuation, and control electronics.

 

COSIMA Contest Preparation: The internship will align with the COSIMA competition requirements, preparing students to present their prototypes on the competition day. Guidance will be provided on creating impactful presentations that showcase the ingenuity and practicality of their solutions.

 

Go International: The winners of COSIMA will qualify to take part in the international iCAN competition. Guidance and preparation for the iCAN will be provided.

 

Entrepreneurial Mindset: Drawing inspiration from successful startups that emerged from COSIMA, the internship will instill an entrepreneurial mindset. Students will learn about the essentials of founding a startup, from business planning to pitching their ideas.

 

Us in the past:

iCANX Wettbewerb 2024 (cosima-mems.de)

Das war COSIMA 2023 (cosima-mems.de)

iCAN Wettbewerb 2023 (cosima-mems.de)

Sieger COSIMA 2022 (cosima-mems.de)

Prerequisites

Intermediate German and English language proficiency is required.

Contact

Supervisor:

Yushen Zhang

Internships

Vector Graphics Generation from XML Descriptions of Chip Modules

Description

Project Description

In this project, students will develop a Java program that reads an XML file describing various characteristics of a chip module and generates corresponding SVG graphics based on the provided descriptions. This project aims to enhance students’ understanding of XML parsing, SVG graphics creation, and Java programming. By the end of the project, students will have a functional tool that can visualize chip modules dynamically.

Objectives

  • To understand and implement XML parsing in Java.
  • To learn the basics of SVG graphics and how to generate them programmatically.
  • To develop a Java application that integrates XML data with SVG output.
  • To enhance problem-solving and programming skills in Java.

Prerequisites

Students should have the following skills and knowledge before starting this project:

  • Basic Java Programming: Understanding of Java syntax, object-oriented programming concepts, and basic data structures.
  • XML Basics: Familiarity with XML structure and how to read/write XML files.
  • SVG Fundamentals: Basic knowledge of SVG (Scalable Vector Graphics) and its elements.
  • Problem-Solving Skills: Ability to break down complex problems into manageable tasks and implement solutions.

Project Tasks

  1. XML Parsing: Write a Java program to read and parse the XML file containing chip module descriptions.
  2. SVG Generation: Develop methods to convert parsed XML data into SVG graphics.
  3. Integration: Combine XML parsing and SVG generation into a cohesive Java application.
  4. Testing and Debugging: Test the application with various XML files and debug any issues that arise.
  5. Documentation: Document the code and provide a user guide for the application.

Prerequisites

See above.

Contact

Yushen.Zhang+Project@cs.tum.edu

Supervisor:

Yushen Zhang

Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge

Keywords:
Microsystem, MEMS, Innovation, Creativity
Short Description:
This is a dynamic and hands-on internship designed to empower students to harness their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming an innovative entrepreneur in the realm of sensor and microsystem applications. At the end of this contest, you will be credited with the credits for FP/IP/IDP.

Description

Welcome to "Startup Microsystems: Innovate, Create, Compete – COSIMA Challenge," a dynamic and hands-on internship designed to empower students in harnessing their creativity and technical skills to participate in the COSIMA (Competition of Students in Microsystems Applications) contest. This internship is not just an academic pursuit; it's a journey towards becoming innovative entrepreneurs in the realm of sensor and microsystem applications.

 

COSIMA is a German national student competition. 

 

Overview:

In this practical internship, students will delve into the world of microsystems, exploring their components, functionalities, and potential applications. The focus will be on fostering creativity and teamwork as students work collaboratively to conceive, design, and prototype innovative solutions using sensors and microsystems.

 

Key Features:

Creative Exploration: Unlike traditional courses and internships, this one offers the freedom to choose and define your own technical challenge. Students will be encouraged to think outside the box, identify real-world problems, and propose solutions that leverage microsystems to enhance human-technology interactions.

 

Hands-On Prototyping: The heart of the internship lies in turning ideas into reality. Students will actively engage in the prototyping process, developing functional prototypes of their innovative concepts. Emphasis will be placed on understanding the practical aspects of sensor integration, actuation, and control electronics.

 

COSIMA Contest Preparation: The internship will align with the COSIMA competition requirements, preparing students to present their prototypes on the competition day. Guidance will be provided on creating impactful presentations that showcase the ingenuity and practicality of their solutions.

 

Go International: The winners of COSIMA will qualify to take part in the international iCAN competition. Guidance and preparation for the iCAN will be provided.

 

Entrepreneurial Mindset: Drawing inspiration from successful startups that emerged from COSIMA, the internship will instill an entrepreneurial mindset. Students will learn about the essentials of founding a startup, from business planning to pitching their ideas.

 

Us in the past:

iCANX Wettbewerb 2024 (cosima-mems.de)

Das war COSIMA 2023 (cosima-mems.de)

iCAN Wettbewerb 2023 (cosima-mems.de)

Sieger COSIMA 2022 (cosima-mems.de)

Prerequisites

Intermediate German and English language proficiency is required.

Contact

Supervisor:

Yushen Zhang

Student Assistant Jobs

Fine-grained Exploration and Optimization of Deployment Parameters for Efficient Execution of Machine Learning Tasks on Microcontrollers

Description

Motivation

HW/SW Codesign, a technique that has been around for several decades, allows hardware designers
to take the target application into consideration and further enables software engineers to start
developing and testing firmware before actual hardware becomes available. This can drastically
reduce the time-to-market for new products and also comes with lower error rates compared to
conventional development cycles. Virtual prototyping is an important component in the typical
HW/SW-Codesign flow as software can be simulated at several abstraction layers (RTL-Level,
Instruction Level, Functional-Level) at early development stages, not only to find potential
hardware/software bugs but also to gain importation information regarding the expected
performance and efficiency metrics such as Runtime/Latency/Utilization.
Due to the increasing relevance of machine learning applications in our everyday lives, the co-
design and co-optimization on the hardware and models (HW/Model-Codesign) became more
popular, hence instead of the C/C++ code to be executed on the target device, the model
architecture and training aspects are aligned with the to be designed hardware or vice-versa.
However, due to the high complexity of nowadays machine learning frameworks and software
compilers, the deployment-related parameters also play a bigger role, which should be investigated
and exploited in the thesis.

Technical Background
The Embedded System Level (ESL) group at the EDA chair has a deep background in virtual
prototyping techniques. Recently, embedded machine learning became a highly exciting field of
research.
We are working primarily in an open-source software ecosystem. ETISS[1] is the instruction set
simulator that allows us to evaluate various embedded applications for different ISAs (nowadays
mainly RISC-V). Apache TVM[2] has been our ML deployment framework of choice for several
years now, especially due to its MicroTVM subproject. Our TinyML deployment and benchmarking
framework MLonMCU[3] is a powerful tool that enables us to evaluate different configurations of
tools fully automatically. However the actual candidates for evaluation need to be chosen manually,
which can lead to suboptimal results.

Task Description
In this project, an automated exploration for deployment parameters should be established. Further,
state-of-the-art optimization techniques must be utilized to find optimal sets of parameters in the
hyper-dimensional search space in an acceptable amount of time (no exhaustive search feasible).
The optimization should take multiple deployment metrics (for example, total runtime or memory
footprint) into account, yielding to a multi-objective optimization flow.
The to-be-implemented algorithms should build up on the existing tooling for prototyping and
benchmarking TinyML models developed at the EDA chair (ETISS & MLonMCU). If available,
existing libraries/packages for (hyper-parameter) optimization (for example, Optuna[4] or
XGBoost[5]) can be utilized.

First, a customizable objective function is required, which can be calculated, for example, based on
the weighted sum of relevant metrics determined using MLonMCU and should be later integrated
into the optimization algorithms.
To keep the complexity of the task low, the considered hardware and machine learning models can
be assumed as fixed. The workloads are further provided as already trained (and compressed)
models. The focus will thereby be solely on the deployment aspects of the machine learning
applications, which are mostly defined by the used machine learning and software compilers. The
search space grows in size heavily depending on the number of considered free variables, which can
be of different types (for example, categorical, discrete, sequential,…). Some examples are:
- Used data/kernel layout for convolution operations (NCHW, NHWC, HWIO, OHWI,…)
- Choice of kernel implementation (trivial, fallback, tuned, 3rd party kernel library, external,
accelerator,…)
- Compiler Flags (-O3/-Os/…, -fno-unroll,…)
It might turn out that some decisions might be helpful for some layers, while others would profit
from slightly or heavily different sets of parameters. Therefore, it should be possible to perform the
exploration on a per-layer fashion, which could yield even better results.
The optimization and exploration flow shall be visualized (for example Pareto plots) for the user
and executed in an efficient way to make sure of the available resources on our compute servers
(utilizing parallel processing and remote-execution features provided by MLonMCU)

Work Outline
1. Literature research
2. Setup toolset (MLonMCU → ETISS + TVM + muRISCV-NN)
3. Describe customizable objective/score functions for the optimization algorithm
4. Define search space(s) for deployment-parameter exploration
5. Develop automated exploration and optimization flow around the MLonMCU tool which
can take a batch of parameters and return the metrics used as inputs of the objective function
6. Investigate the potential of fine-grained (per-layer) optimization compared to a holistic (end-
to-end) approach
7. Optional: Introduce constraints (for example ROM footprint <1MB) to remove illegal
candidates from the search space (and potentially skip the time-consuming execution of
candidates)
8. Optional: Allow fast-estimation of deployment metrics by training a cost-model based on
the previous experiments.

References
[1] Mueller-Gritschneder, D., Devarajegowda, K., Dittrich, M., Ecker, W., Greim, M., & Schlichtmann, U. (2017, October). The
extendable translating instruction set simulator (ETISS) interlinked with an MDA framework for fast RISC prototyping. In
Proceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype
(pp. 79-84). GitHub: https://github.com/tum-ei-eda/etiss
[2] Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., ... & Krishnamurthy, A. (2018). TVM: end-to-end optimization
stack for deep learning. arXiv preprint arXiv:1802.04799, 11(2018), 20. GitHub: https://github.com/apache/tvm
[3] van Kempen, P., Stahl, R., Mueller-Gritschneder, D., & Schlichtmann, U. (2023, September). MLonMCU: TinyML
Benchmarking with Fast Retargeting. In Proceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI (pp.
32-36). GitHub: https://github.com/tum-ei-eda/mlonmcu
[4] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization
framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-
2631). GitHub: https://github.com/optuna/optuna
[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining (pp. 785-794). GitHub: https://github.com/dmlc/xgboost

Contact

Philipp van Kempen

Supervisor:

Philipp van Kempen

Cadence Internship position for AI ML assisted Functional Verification

Description

siehe pdf

Contact

marion@cadence.com

Supervisor:

Ulf Schlichtmann - (Cadence)

Studentische Hilfskraft FPGA-Synthese und Programmierung

Description

siehe angehängtes pdf-File

Contact

Supervisor:

Ulf Schlichtmann - Thomas Becker (Lehrstuhl für Brau- und Getränketechnologie)