PASSTA

PASSTA (IPC and Synchronizing Shared data on heterogeneous MPSoCs) is a project in cooperation with Huawei focusing on assisting traditional Operating System services with hardware. Currently, we investigate Inter-Process Communication (IPC) mechanisms in Linux and plan to extend the developed concepts further to applocations within micro-kernels.

IPC is a general term for mechanisms used to communicate amongst different processes. This communication can be used for data transfer, synchronization, or both. Traditionally, IPC functionalities are tightly integrated into the Operating System as they are highly critical for achieving good throughputs and latencies. The ongoing development towards heterogeneous multi-/many-core processor architectures exposes performance insufficiencies in established IPC services. An increasing number of CPUs and more fine-grained multi-threaded/-process applications lead to more dependencies and data exchange between different application parts, utilizing IPC services to synchronize accesses to the same dataset. The ever-increasing utilization of IPC mechanisms highlights these insufficiencies, which impact the applications' performance.

When multiple threads interact, scenarios can occur where a thread must wait for a certain condition before continuing its execution. For instance, such a condition may be a free lock or the availability of data. An event can be considered as a state change of this condition and can be triggered by different sources such as device drivers, communication channels or interacting threads. Event notification is used to inform that a particular event of interest has occurred. In PASSTA, we focus on blocking event notification mechanisms. In these mechanisms, the event-receiving thread calls a specific function to get new events. This function returns successfully if an event is available. Otherwise, this function puts the thread asleep, waiting for the corresponding condition to be met (blocking behavior). An event-generating thread has to notify the sleeping thread about a condition change and thus an occurred event by triggering its wake-up (event notification).

To improve this event notification we develop in PASSTA a concept to assist blocking IPC mechansisms with a hardware component (HWAcc). When improving event notification, two metrics have to considered:

  1. Metric 1 - Syscall duration: As depicted in the figure with M1, the syscall includes the event generation and the event notification that initiates the thread wake-up. The CPU cycles required for the syscall show the overhead that an event notification can add to a thread that generates an event.
  2. Metric 2 - Wake-up latency: This metric denotes the number of cycles for a thread wake-up initiation, and is labeled with M2 in the figure. For this, we measure the time from the start of the event notification function in thread B until thread A is active. As the HWAcc processes a request asynchronously, only metric 2 includes the execution time of the HWAcc itself.

Reduce burden for event-generating thread (Metric 1):

Thread A has to wait for a particular condition before it can continue its execution, e.g., if a lock is not available (Cond. Check). In Linux, to be notified about a change in this condition, a wait list is used to specify the expected notification when this event occurs. This wait list is filled by thread A before it goes to sleep. Notification of a sleeping thread is initiated by its wake-up. Many wait lists exist in the kernel, each tied to a certain element (e.g., a file descriptor), while the wait list structure is always the same. After thread B generates an event the condition becomes valid, e.g., if a lock is released (Event Gen.). Therefore thread B checks whether another thread was waiting for the event by querying the wait list, and to wake up thread A in the original event notification approach (Event Notify).

In PASSTA we developed a concept to facilitate the event notification by initiating the thread wake-up from a hardware unit (HWAcc), thus relieving the thread that generates an event. This can be achieved by replacing the default notification function in the wait list in step Cond. Check with one that offloads the wake-up initiation to the HWAcc. The HWAcc then asynchronously initiates a thread wake-up.

Reducing the latency in blocking IPC mechanisms (Metric 2):

Several steps are involved in the wake-up procedure triggered by an event-generating thread. First, the waker (thread B) determines which thread to wake up and where to wake it up. After that, an IRQ/IPI is sent to the core on which thread A should be woken up. On the wakee side in the interrupt service routine, the wakelist is checked, and consequently, the scheduler is triggered to determine whether the newly awakened thread A should run on the core. All these steps contribute to the latency of IPC and consist mainly of scheduling-related functions.
We aim to decrease the time spent in scheduling-related functions to reduce the latency of blocking IPC mechanisms. Therefore, we introduce a hardware-assisted scheduling class into Linux, which offloads scheduling functionalities to a dedicated hardware unit. This newly created scheduling class coexists with the standard scheduling classes, thus enabling a seamless integration into the Linux kernel.

Thesis Offers

Development of a web application to control a hardware demonstration platform

Description

In this thesis, you lead the design and development of a web application to control a hardware demonstration platform and visualize the load of the available hardware resources, such as CPUs and hardware units. The used hardware platform is a Xilinx Zynq board. This features a heterogeneous ARM multicore setup directly integrated into the ASIC, combined with programmable logic in the FPGA part of the chip. In the FPGA, a hardware assist is implemented that improves blocking mechanisms in Linux by assisting the kernel with managing waiting threads. For further insights, the FPGA is also equipped with a 10G ethernet connection to send live data to a different PC for analysis and status information.

Responsibilities:

  • Understand the hardware demonstration platform's functionality and requirements for visualization and control.
  • Utilize your expertise in web development to design and create an interactive web application that visually represents hardware resource utilization (e.g., CPUs, hardware units) through graphs, charts, or other intuitive visualizations.
  • Develop a user-friendly control interface within the web application allowing users to start, manage, and monitor software applications running on the demonstration platform.
  • Conduct comprehensive testing, debugging, and optimization of the web application to ensure seamless functionality and performance.
  • Engage in regular meetings, providing updates on the project's progress, and actively participate in software design and implementation discussions.

 

Prerequisites

  • Experience with data visualization libraries or similar tools to create dynamic and informative visual representations.
  • First experience in web development projects, coursework, or internships showcasing relevant skills and expertise.
  • Understanding of hardware resource monitoring and visualization concepts and the ability to translate these into effective user interfaces.
  • Strong problem-solving skills, attention to detail, and the ability to work both independently and collaboratively in a team environment.
  • Knowledge of system administration or hardware-related concepts to facilitate seamless integration between the web application and the demonstration platform.

Contact

Email: lars.nolte@tum.de

Supervisor:

Lars Nolte

Interested in an internship or a thesis? Please send us (Tim Twardzik, Lars Nolte) an email.
The given type of work is just a guideline and could be changed if needed.
From time to time, there might be some work, that is not announced yet. Feel free to ask!

Ongoing Theses

Hardware-Accelerated Linux Kernel Tracing

Description

Tracing events with hardware components is one powerful tool to monitor, debug, and improve existing designs. Through this approach, detailed insights can be acquired, and peak performance can be achieved, while being a challenging task to be integrated with good performance. One of the major challenges of tracing is to collect as much information as possible with ideally no impact on the to-be-analyzed system. Herewith, it can be ensured that the gained insights are representative of an execution without any tracing enabled. In this work, a hardware tracing component should be leveraged to reduce the intrusiveness of existing software tracing mechanisms in the Linux kernel. 

This should be integrated and tested on a hardware platform based on a Xilinx Zynq board. This features a heterogeneous ARM multicore setup directly integrated into the ASIC, combined with programmable logic in the FPGA part of the chip. In the FPGA a hardware accelerator is already implemented that should be traced with the new component.

Prerequisites

To successfully complete this work, you should have:

  • experience with microcontroller programming,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Supervisor:

Lars Nolte

Non intrusive hardware tracing over ethernet

Description

Tracing of events in hardware components is one powerful tool to monitor, debug and improve existing designs. Through this approach detailed insights can be acquired and peak performance can be achieved, while being a challenging task to be integrated with good performance. One of the major challenges of tracing is to collect as much information as possible with ideally no impact on the to-be-analyzed system. Herewith, it can be ensured that the gained insights are representative of an execution without any tracing enabled. In this work, a hardware tracing component should be designed that takes an arbitrary data input and sends it via an ethernet connection to a different PC that performs the postprocessing of the data. The tracing component has to be designed in a way that for sending the data over ethernet no CPU involvement is required to minimize the impact on the traced system. This tracing component should be integrated into the hardware platform based on a Xilinx Zynq board. This features a heterogeneous ARM multicore setup directly integrated into the ASIC, combined with programmable logic in the FPGA part of the chip. In the FPGA a hardware accelerator is already implemented that should be traced with the new component.

Prerequisites

To successfully complete this work, you should have:

  • good HDL programming skills,
  • experience with microcontroller programming,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Contact

Email: lars.nolte@tum.de

Supervisor:

Lars Nolte

Comparing DPDK with traditional Linux based networking

Description

With the ever-increasing network speeds of physical links, the processing of packets on network nodes is becoming more and more of a bottleneck. Packet processing on a standard Linux-based network node traditionally involves the operating system (OS). Since an OS is usually optimized for a range of tasks rather than a specific task, using conventional Linux kernel functionalities for packet processing can degrade performance. For this reason, approaches to bypass the kernel have been proposed to perform network processing in user space.

One approach of bypassing the kernel that has attracted growing interest in recent years is Data Plane Development Kit (DPDK). By processing packets entirely in user space, DPDK avoids time-consuming context switches between user space and kernel space. This comes at the cost of one CPU core actively polling for new packets, instead of the network interface card (NIC) triggering interrupts for incoming packets. In addition, DPDK itself mainly provides the poll mode drivers for selected NICs, but the processing of the packets is the duty of the application using DPDK. Thus, while DPDK is suitable for certain application scenarios, there are also numerous use cases that are better suited to be implemented using the Linux networking stack. For example, to establish a Transmission Control Protocol (TCP) connection, an additional user space TCP/IP stack must be implemented or taken from open-source projects. These are generally not as feature-rich as the conventional Linux networking stack and do not necessarily improve performance.

This work aims to find a method to compare applications using DPDK with applications using the Linux network stack. Envisioned is a client-server application that uses iperf3 to generate data traffic.

Prerequisites

To successfully complete this work, you should have:

  • very good programming skills in Python and C/C++,
  • basic knowledge about Git,
  • first experience with the Linux environment.

The student is expected to be highly motivated and independent.

Contact

Email: lars.nolte@tum.de

Supervisor:

Lars Nolte

Analyzing Power Consumption in a Simulation Model

Description

Next to the raw computational performance of an system on chip (SoC), the power consumption is an important trait. Especially in the mobile domain, where the efficiency of the SoC is critical, the power consumption of the whole system needs to be considered. Multiple factors contribute to the overall power consumption, such as activity/state of the CPU, memory accesses including cache misses and use of potential hardware acceleration.

In this work the power consumption of a SoC should be analyzed, herefor an gem5 simulation model exists.

Supervisor:

Tim Twardzik

Exploring Hardware-Acceleration for the Linux Scheduler

Description

 

The operating system is responsible for scheduling different threads and processes running in the system. Hereby, the goal of the operating system is to provide each thread a fair share of the underlying available compute resources. The Linux kernel includes a software scheduler which determines where and when each thread should run. This decision is critical for the performance of multi-threaded application where compute resources need to be efficiently shared among multiple threads. Furthermore, the time it takes to make this decision and to switch from one thread to another are critical performance parameters for any software scheduler.

 

For this project, the goal is to analyze the Linux Scheduler and determine how hardware acceleration can be supported. This can include replacing scheduling classes or offloading specific scheduling functions to a hardware accelerator. At the end of the project a concept for hardware acceleration for the Linux Scheduler should be presented. The project is divided into two subprojects. One part focuses on the regular scheduling functions which are called during the periodic scheduler calls in the Linux kernel, while the other takes a look at rescheduling functionalities through interrupts.

 

Evaluation and comparisons are to be done with the full system cycle accurate simulator GEM5. For this, an existing setup of an ARM architecture including various tracing options is available.

 

Supervisor:

Tim Twardzik

Completed Theses

2024

Bachelor's Theses

  • 24.01.2024
    Interprocess Communication: Signal events in user space with ueventfd and upipe
    Supervisor: Lars Nolte

2023

Bachelor's Theses

  • 30.10.2023
    Non-invasive integrated event tracing of FPGA via Ethernet
    Supervisor: Lars Nolte
  • 08.09.2023
    Non intrusive hardware tracing over ethernet
    Supervisor: Lars Nolte
  • 20.06.2023
    Optimization of Hardware Assisted Futex Implementation on Zynq Ultrascale+
    Supervisor: Lars Nolte
  • 20.03.2023 Maximilian Grözinger
    Digital Design and Validation of Hardware Assisted Futex - Implementation on Zynq Ultrascale+
    Supervisor: Lars Nolte
  • 03.03.2023
    Analyzing Remote Procedure Calls in a Linux Environment
    Supervisor: Tim Twardzik

Master's Theses

  • 22.05.2023
    Hardware-assisting the User-Epoll mechanism in Linux
    Supervisor: Lars Nolte
  • 30.03.2023
    Optimizing high-speed network packet processing in Linux
    Supervisor: Lars Nolte

Research Internships (Forschungspraxis)

  • 20.12.2023
    Setting up L4Re on a Raspberry Pi
    Supervisor: Lars Nolte
  • 15.01.2023
    Implementation of a Finite State Machine for Hardware Managed Futexes on Zynq Ultrascale+
    Supervisor: Lars Nolte

Interdisciplinary Projects

  • 17.03.2023
    Exploring Hardware-Acceleration for the Linux Scheduler
    Supervisor: Tim Twardzik

2022

Bachelor's Theses

  • 29.08.2022
    Evaluating Asynchronous Communication Mechanisms in MPSoCs
    Supervisor: Tim Twardzik
  • 09.03.2022
    Reduction of the Simulation Time of the Gem5 Simulator
    Supervisor: Lars Nolte

Master's Theses

  • 20.12.2022
    Developing and Evaluating a Lightweight Hardware- accelerated Event Notification Mechanism in Linux
    Supervisor: Tim Twardzik
  • 13.12.2022
    Digital Design and Validation of a Futex Hardware Accelerator – Emulation on Zynq Ultrascale+
    Supervisor: Lars Nolte

Research Internships (Forschungspraxis)

  • 30.09.2022
    Cache Coherent Hardware Accelerator Integration into an ARM Multicore Platform with a FPGA extension
    Supervisor: Lars Nolte
  • 01.07.2022
    Developing and Evaluating a Lightweigth Hardware Accelerated IPC Mechanism
    Supervisor: Tim Twardzik
  • 12.06.2022
    Performance Improvement Evaluation of Hardware Accelerated Linux Thread Wake-ups
    Supervisor: Lars Nolte

Seminars

  • 20.07.2022
    [MSEI] A survey on asynchronous event notification mechanisms in Linux systems.
    Supervisor: Lars Nolte
  • 28.01.2022
    Survey on Linux Scheduler and Options to tweak an Application’s Performance
    Supervisor: Lars Nolte

Student Assistant Jobs

  • 31.07.2022
    Hardware Accelerator Integration into an ARM Multicore Platform with a FPGA extension
    Supervisor: Lars Nolte

Interdisciplinary Projects

  • 25.07.2022
    Development of a Commmunication Library using Hardware-accelerated Inter-Process Communication
    Supervisor: Tim Twardzik

2021

Bachelor's Theses

  • 01.12.2021
    Integration of Performance Counter into a simulation model of a hardware accelerator in Gem5.
    Supervisor: Lars Nolte
  • 22.09.2021
    A Performance Analysis of the Linux Scheduler on ARM-based Systems
    Supervisor: Tim Twardzik
  • 15.09.2021
    Analysis of Semaphore IPC Mechanisms in Linux
    Supervisor: Tim Twardzik
  • 13.09.2021
    Setup of an ARM Multicore Platform with a FPGA extension using a Xilinx Zynq Board and a Linux OS.
    Supervisor: Lars Nolte
  • 06.07.2021
    Low-intrusive Software Tracing and Profiling using a Gem5 Simulator
    Supervisor: Lars Nolte
  • 06.07.2021
    Low-intrusive Software Tracing and Profiling using a Gem5 Simulator
    Supervisor: Lars Nolte

Master's Theses

  • 13.12.2021
    Development of a Generic Framework for Linux Task Offloading to Hardware on a Multicore Architecture.
    Supervisor: Lars Nolte
  • 13.12.2021
    Development of a Generic Framework for Linux Task Offloading to Hardware on a Multicore Architecture.
    Supervisor: Lars Nolte

Research Internships (Forschungspraxis)

  • 20.12.2021
    Conecpt for Hardware-supported Scheduling in Linux
    Supervisor: Tim Twardzik
  • 15.12.2021
    Processing Simulation based Tracing Information
    Supervisor: Tim Twardzik
  • 12.05.2021
    Continuous Integration set up for a Gem5 Simulator project
    Supervisor: Lars Nolte

Publications

  • Lars Nolte, Tim Twardzik, Camille Jalier, Zhigang Huang, Jiyuan Shi, Thomas Wild, Andreas Herkersdorf: HW-FUTEX: Hardware-Assisted Futex Syscall. IEEE Transactions on Very Large Scale Integration Systems, 2023 more… BibTeX Full text ( DOI )
  • Lars Nolte, Tim Twardzik, Camille Jalier, Zhigang Huang, Jiyuan Shi, Clara Kowalsky, Thomas Wild, Andreas Herkersdorf: HAWEN: Hardware Accelerator for Thread Wake-Ups in Linux Event Notification. 2023 60th ACM/IEEE Design Automation Conference (DAC), 2023 more… BibTeX
  • Lars Nolte, Tim Twardzik, Camille Jalier, Zhigang Huang, Jiyuan Shi, Thomas Wild, Andreas Herkersdorf: GLS Tracing: Gem5-based Low-intrusive Software Tracing. 2022 IEEE Nordic Circuits and Systems Conference (NorCAS), 2022 more… BibTeX