Seminar on Integrated Systems

Lecturer (assistant)
TypeSeminar
Duration3 SWS
TermSommersemester 2024
Language of instructionEnglish

Dates

Admission information

See TUMonline
Note: Note: Limited number of participants! Registration via TUMonline from March 27th to April 21st 2024 is required. Students have to choose a seminar topic before the introduction lesson. Therefore you need to contact the supervisor of the topic you are interested in. Topics are selected on a first-come first-served basis. Topics will be published on April 8th 2024 at <a href="https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-integrierte-systeme/">https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-integrierte-systeme/</a>.

Objectives

At the end of the seminar, the student is able to present a state-of-the-art literature review in the area of integrated systems in an understandable and convincing manner. The following competencies will be acquired: * The student is able to independently analyze state-of-the-art concepts in the field of integrated systems. * The student is able to present a topic in a structured way according to problem formulation, state of the art, goals, methods, and results. * The student can present a topic according to the structure given above orally with a set of slides and with a written report.

Description

Specific topics in the area of integrated circuits and systems will be offered. The participants independently work on a current scientific topic, write a paper, design a poster and present their topic in a talk. In the subsequent discussion, the topic will be treated in-depth.

Prerequisites

Basic knowledge of integrated circuits and systems and their applications.

Teaching and learning methods

Participants elaborate a given scientific topic by themselves in coordination with the respective research assistant.

Examination

Examination with the following elements: - 50 % paper of 4 pages in IEEE format - 50 % presentation of 15-20 minutes and subsequent questions

Recommended literature

Themen-spezifische Literatur wird vom jeweiligen Betreuer empfohlen und soll durch eigene Recherchen ergänzt werden.

Links

Offered Topics

Seminars

Investigating DNN Accuracy Predictors for Network Architecture Search

Description

Advancements in neural network accuracy predictors have reduced architecture evaluation costs in NAS. However, a major trade-off is the lack of generalizability. Neural predictors typically rely on architecture-specific encodings within designated search spaces, limiting the scope of exploration for search algorithms. Specialized predictors are designed for individual search spaces, requiring additional training and labeling costs. A generalized neural predictor would overcome these constraints by accepting input from multiple search spaces and learning a robust architecture representation. This seminar investigates accuracy prediction methods for accurately estimating and ranking network performance across unseen search spaces, not limited to open-source NAS benchmarks.

Reference: Mills, Keith G., et al. "Gennape: Towards generalized neural architecture performance estimators." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 8. 2023.

 

Contact

Shambhavi.Balamuthu-Sampath@bmw.de

 

Supervisor:

William Wulff - Shambhavi Balamuthu Sampath (BMW)

Reparametrizable DNNs for efficient inference on edge

Description

Complicated Convolutional Neural Networks achieve higher accuracy but have drawbacks during deployment. Multi-branch designs are challenging to implement, slow down inference, and reduce memory utilization. Certain layers like the depthwise convolution or channel shuffle increase memory access cost and sometimes lack device support. Inference speed is influenced by multiple factors, not just floating-point operations (FLOPs). Nonetheless, multi-branch networks are beneficial for improved performance. To tackle this speed-accuracy trade-off, decoupling the training-time multi-branch architecture from the inference-time plain network architecture is possible. This seminar aims to investigate works that achieve fast inference through structural reparametrization of multi-branch networks. 

Reference: Ding, Xiaohan, et al. "Repvgg: Making vgg-style convnets great again." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

 

Contact

Shambhavi.Balamuthu-Sampath@bmw.de

Supervisor:

William Wulff - Shambhavi Balamuthu Sampath (BMW)

Minimizing on-device latency measurements for HW-aware DNN optimization

Description

For resource-constrained edge hardware, the development and deployment of high-performance Deep Neural Networks (DNNs) across various applications requires the optimization of DNNs for both latency and accuracy. One way to achieve this is by Network Architecture Search (NAS), which aims to find efficient inference models for target hardware. However, it is an iterative process of evaluating numerous models that incurs significant computational costs. Additionally, accurately measuring on-device latency for each model within a large search space is both exhaustive and impractical. To address these challenges, this seminar focuses on the concept of adaptive sampling, which aims to minimize on-device latency measurements and accelerate hardware-aware DNN optimization. We thoroughly investigate the end-to-end DNN latency prediction methods that can be made sample-efficient. Furthermore, we aim to quantify the potential impact on the GPU hours saved within a typical NAS pipeline, comparing scenarios with and without the employing the latency estimator.

Contact

Shambhavi.Balamuthu-Sampath@bmw.de

Supervisor:

William Wulff - Shambhavi Balamuthu Sampath (BMW)

Accelerating End-to-End Autonomous Driving Models on Edge Hardware

Description

 

In recent years, foundation models have become popular as generic solvers for different tasks, such as text generation, image generation, and semantic segmentation.

In an effort to unify the challenges of autonomous driving, from perception, to occupancy, motion, and path planning, recent works have attempted to create foundation models for end-to-end autonomous driving.



In recent years, foundation models have become popular as generic solvers for different tasks, such as text generation, image generation, and semantic segmentation.

In an effort to unify the challenges of autonomous driving, from perception, to occupancy, motion, and path planning, recent works have attempted to create foundation models for end-to-end autonomous driving.

These models have performed exceptionally well and have proven to be a promising candate to solve the challenges of higher-levels of autonomous driving. However, the complexity of these models and their strictly sequential structure makes it difficult for them to meet real-time execution demands. In this seminar topic, the different approaches to end-to-end autonomous driving will be researched and compared. Then an analytical study will be performed to identify the hardware challenges and opportunities in accelerating them on edge.

Contact

Nael.Fasfous@bmw.de

Supervisor:

Nael Yousef Abdullah Al-Fasfous - Nael Fasfous (BMW Group)

Assigned Topics

Seminars

A Survey on NVM technologies

Description

NVM memory technologies are essential for most kinds of computer systems. However, beside the challenge of a limited lifespan of NVM memories.

The goal of this Seminar is to study and various NVM technologies with several optimizations and present their benefits and usecases. A special focus should be put on usecases, benefits and drawbacks and application costs. A starting point of literature will be provided.

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

Contact

Oliver Lenke

o.lenke@tum.de

 

Supervisor:

Oliver Lenke

Access-Predictors on Cache Level

Description

DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.

However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.

In order to reduce the DRAM access latency, the cache hierarchy can be extended by dedictated hardware access predictors in order to preload certain data to the caches before it is actually accessed.

The goal of this Seminar is to study and compare prefetching mechanisms and access predictors on cache level with several optimizations and present their benefits and usecases. A starting point of literature will be provided.

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

Contact

Oliver Lenke

o.lenke@tum.de

 

Supervisor:

Oliver Lenke

Access-Predictors on Cache Level

Description

DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.

However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.

In order to reduce the DRAM access latency, the cache hierarchy can be extended by dedictated hardware access predictors in order to preload certain data to the caches before it is actually accessed.

The goal of this Seminar is to study and compare prefetching mechanisms and access predictors on cache level with several optimizations and present their benefits and usecases. A starting point of literature will be provided.

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

Contact

Oliver Lenke

o.lenke@tum.de

 

Supervisor:

Oliver Lenke

An Overview of Service Migration in Modern Edge Computer Networks

Description

In modern Edge computer networks, applications and services should adhere to service-level agreements (SLA) like low latency or minimal throughput. Depending on demand and resource availability, these services have to be migrated between compute nodes to ensure these SLAs.

Service migration is a critical aspect of Edge computing, enabling the movement of services closer to the data source or end-users for improved performance and reduced latency. However, it comes with its own set of challenges, such as maintaining service continuity and managing resource constraints. This involves checkpointing and restarting of the applications (potentially in containers), as well as moving the data from one compute node to the other. This data movement could be further improved with RDMA technology.

This seminar should provide a background overview of the required technologies for service migration and explore recent improvements for low-latency service migration in both hardware and software.

Contact

marco.liess@tum.de

Supervisor:

Marco Liess

Exploring Linux eBPF Mechanism for SmartNICs

Description

eBPF (extended Berkeley Packet Filter) is a technology used in Linux for running user-defined sandboxed programs in the kernel without changing kernel source code or loading kernel modules. In networking, eBPF can be used to redefine the network stack behavior by allowing the dynamic insertion of powerful networking and security functions deep inside the Linux kernel.

SmartNICs (Network Interface Cards with a programmable processor) can offload some processing tasks that the system CPU would normally handle. This is beneficial in freeing up CPU resources and improving networking performance. eBPF can be used in conjunction with SmartNICs to offload some network processing tasks to the SmartNIC, further enhancing performance.

The goal of this seminar topic is to provide a background overview of Linux eBPF in networking and to explore how eBPF can be leveraged in SmartNICs to improve network performance and security. Look into recent advancements, challenges, and future prospects.

Contact

marco.liess@tum.de

Supervisor:

Marco Liess

DRAM Controller with Access Predictors

Description

DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.

However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.

In order to reduce the DRAM access latency, DRAM controllers provide sophisticated mechanisms, such as access predictors or built-in caches. The goal of this Seminar is to study and compare DRAM controller designs with several optimizations and present their benefits and usecases. A starting point of literature will be provided.

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

Contact

Oliver Lenke

o.lenke@tum.de

 

Supervisor:

Oliver Lenke

A Survey on Benchmarking Systems

Keywords:
Benchmark, Linux

Description

As technology advances, the performance of CPUs plays a crucial role in various computational tasks ranging from everyday computing to specialized applications like gaming, artificial intelligence, and scientific simulations. Benchmarking CPU performance helps in understanding and comparing the capabilities of different processors across various workloads. This seminar topic aims to conduct a comprehensive survey on benchmark suites commonly used for evaluating CPU performance.

For this, various state-of-the-art benchmark suites should be analyzed and compared against each other based on pre-defined criteria.

The goal of this survey is to generate an overview and comparative analysis of the different benchmark suites that are available and focus on their unique approaches.

Supervisor:

Tim Twardzik

Advancements in Vector Processor Architectures

Description

Vector processors are a type of SIMD processors that can efficiently operate on arrays, allowing significant speed-ups for certain applications, such as scientific computing and DSP. While vector processors were initially successful in the supercomputers of the 70s and 80s, the modern microprocessor gradually, with a few exceptions, displaced vector processors from the market.

However, in recent years has seen a lot of research and commercial interest in vector processors. Major developments include vector and vector-like extensions to popular ISAs, including RISC-V (RISC-V V) and ARM (SVE2), and commercial ASICs such as NEC's SX-Aurora TSUBASA and AMD's Southern Island GPUs. In part, the renewed interest in vector architectures is due to how scalar processors are falling behind in terms of performance and the need to find alternatives to current CPU design paradigms. But vector processors also offer advantages for more specialized applications, including embedded systems where vector processors can potentially lead to improvements in performance and energy efficiency.
 

Contact

william.wulff@tum.de

Supervisor:

William Wulff

Chiplet-Based Architecture Design

Description

Chiplet-based architectures are starting to become available, notably with the release of Intel’s Meteor Lake consumer CPUs at the end of last year. Even though most major players in the field are pursuing this strategy, there seems not to be a clear consensus yet on aspects like the chiplet-to-chiplet interconnect. The Universal Chiplet Interconnect Express (UCIe) standard appears to be a promising approach, but others are being developed, for example Bunch-of-Wires (BOW). In this seminar work, literature on chiplets should be investigated, specifically on topics as die-to-die interconnect or further challenges in the design of chiplet architectures.

Starting points for literature research could be the following papers:

https://ieeexplore.ieee.org/abstract/document/8416868

https://ieeexplore.ieee.org/abstract/document/9174651

https://ieeexplore.ieee.org/abstract/document/9893865

Contact

michael.meidinger@tum.de

Supervisor:

Michael Meidinger

DRAM Controller with Access Predictors

Description

DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.

However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.

In order to reduce the DRAM access latency, DRAM controllers provide sophisticated mechanisms, such as access predictors or built-in caches. The goal of this Seminar is to study and compare DRAM controller designs with several optimizations and present their benefits and usecases. A starting point of literature will be provided.

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

Contact

Oliver Lenke

o.lenke@tum.de

 

Supervisor:

Oliver Lenke

Analysis and Comparison of FPGA-Optimized Network-On-Chips

Description

Network-on-chip (NoC) is a communication architecture used in multi-core and many-core systems to interconnect processing elements (PEs), such as CPUs, GPUs, accelerators, and memory controllers, using packet-switched networks similar to those found in computer networks. It replaces traditional bus-based interconnects with a scalable and modular network infrastructure, offering higher performance, lower latency, and improved scalability. In a NoC, PEs are connected through a network of routers and links, forming a mesh, torus, or other topologies. Each router is responsible for forwarding packets between neighboring PEs using routing algorithms. NoC architectures can vary greatly in terms of topology, routing algorithms, flow control mechanisms, and other parameters, depending on the specific application requirements and design constraints.

Field-Programmable Gate Arrays (FPGAs) are integrated circuits that contain an array of configurable logic blocks interconnected through programmable routing resources. They provide a versatile and powerful platform for implementing digital circuits and systems, offering flexibility, reconfigurability, parallelism, and hardware acceleration capabilities. Hence, they are well-suited for a wide range of applications across various domains, including telecommunications, networking, automotive, aerospace, consumer electronics, and industrial automation.

FPGA-optimized NoCs are tailored to exploit the unique features and capabilities of FPGAs while addressing the challenges of communication and interconnection in FPGA-based systems. They play a crucial role in enabling efficient and scalable communication infrastructure for FPGA-based applications across a wide range of domains. The goal of this seminar work is to investigate and compare state-of-the-art NoCs optimized for FPGAs.

Relevant literature

[1] Huan, Yutian, and André DeHon. "FPGA optimized packet-switched NoC using split and merge primitives." 2012 International Conference on Field-Programmable Technology. IEEE, 2012.

[2] Kapre, Nachiket, and Jan Gray. "Hoplite: Building austere overlay nocs for fpgas." 2015 25th international conference on field programmable logic and applications (FPL). IEEE, 2015.

[3] Monemi, Alireza, et al. "ProNoC: A low latency network-on-chip based many-core system-on-chip prototyping platform." Microprocessors and Microsystems 54 (2017): 60-74.

Contact

Klajd Zyla

Email: klajd.zyla@tum.de

Supervisor:

Klajd Zyla

Rule-Based Reinforcement Learning in HW

Description

This Seminar should investigate, which attempts have been made to implement rule-based RL in HW /FPGAs since a first try in 2006: https://www.sciencedirect.com/science/article/pii/S1383762106000208

Contact

flo.maurer@tum.de

Supervisor:

Reward Assignment in Reinforcement Learning

Description

RL tries to maximies the reward over time.

Therefore, the way the reward is assigned to the agent plays a central role.

Often, the reward is assumed to be predetermined or certain functions are coming from nowhere.

This seminar should investigate reward assignment strategies to solve RL problems.

Contact

flo.maurer@tum.de

Supervisor: