Open Theses

Important remark on this page

The following list is by no means exhaustive or complete. There is always some student work to be done in various research projects, and many of these projects are not listed here.

Don't hesitate to drop an email to any member of the chair asking for currently available topics in their field of research. Or you can email to this email-ID, which will be automatically broadcasted to all members of the chair.

Also, subscribe to the chair's open thesis topics list. So that when a new topic is posted, you can get a notification. Click here for the subscription.  

Abbreviations:

  • PhD = PhD Dissertation
  • BA = Bachelorarbeit, Bachelor's Thesis
  • MA = Masterarbeit, Master's Thesis
  • GR = Guided Research
  • CSE = Computational Science and Engineering

Cloud Computing / Edge Computing / IoT / Distributed Systems

Function-as-a-Service is emerging as a popular cloud programming paradigm due to its simplicity, client-friendly cost model, and automatic scaling. In FaaS user implements fine-grained functions that are independently packaged and uploaded to a FaaS platform and executed on event triggers such as HTTP requests. On invocation, the FaaS platform is responsible for providing resources to the function and its isolation in ephemeral, stateless containers. A major problem in FaaS is the time to provision a new function instance on a new function invocation request, i.e., cold start that can increase user response time and violate SLO's. Current commercial cloud providers like AWS and Google provide a way to mitigate this problem by keep a certain number of function instances warm to handle future function requests. However, in most cases the number of pre-provisioned function instances are set through an iterative approach which can lead to increased costs and resource over-provisioning. 

Goals:

1. The aim of this work is to implement a tool that predicts the optimal pre-provisioned instances to mitigate function cold starts.

2. Prototype the solution on AWS and GCP.

Requirements

  • Basic knowledge of FaaS platforms. Knowledge of Knative is beneficial.
  • Knowledge of docker, K8s.
  • Cloud Monitoring Solutions such as Prometheus.
  • Load Testing tool such as k6.
  • Forecasting and ML Techniques
  • Tensorflow or PyTorch.

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to different systems required for the work
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de).

Background

Federated learning (FL) is a novel distributed training paradigm that enables the collaborative training of ML models across multiple data holders and addresses the fundamental problems of privacy and ownership of data. In FL, remote clients learn a shared model by optimizing its parameters on their local data and sending back the updated parameters. These local model updates are then aggregated to form the new, updated shared model.

Goals

  1. The aim of this work is to develop and implement a framework for comparing different FL frameworks. We have a baseline implementation that needs to be extended.
  2. The frameworks need to be tested across a large federation of clients in terms of scalability and performance.

Requirements

  • Good knowledge of ML and Deep learning.
  • Knowledge of TensorFlow and Pytorch.
  • Good Knowledge of Python.
  • English communication skills.
  • Minimum of 80 ECTS completed.

We offer:

  • Thesis in the area that is highly demanded by the industry.
  • Our expertise in data science and systems areas.
  • Supervision and support during the thesis.
  • Access to the LRZ cloud.
  • Opportunity to publish a research paper with your name on it.

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de

Background: 

This thesis in in collaboration with IfTA GmbH. Details on the thesis can be found on the respective page of IfTA (in german): Masterarbeit Echtzeitfähige Nutzung von mehreren Rechenkernen auf Zynq Ultrascale+ Architektur

Contact: roman.karlstetter@tum.de

 

Background

Federated learning (FL) enables resource-constrained edge devices to learn a shared Machine Learning (ML) or Deep Neural Network (DNN) model, while keeping the training data local and providing privacy, security, and economic benefits. However, building a shared model for heterogeneous devices such as resource-constrained edge and cloud makes the efficient management of FL-clients challenging. Furthermore, with the rapid growth of FL-clients, the scaling of FL training process is also difficult. At CAPS, we are working on the development of FedLess (https://arxiv.org/pdf/2111.03396.pdf), a system and framework that enables FL training across a fabric of heterogeneous devices.

Goals

  1. The aim of this work is to support the developement of FedLess. We have various interesting sub-topics related to its development in the scope of implementing ML algorithms, performance optimization, and software engineering.

Requirements

  • Good knowledge of ML and Deep learning.
  • Knowledge of PyTorch and Tensorflow.
  • Good Knowledge of Python.
  • English communication skills.
  • Knowledge of FaaS platforms.
  • Some experience with AWS and Google Cloud.
  • Minimum of 80 ECTS completed.

We offer:

  • Thesis in the area that is highly demanded by the industry.
  • Our expertise in data science and systems areas.
  • Supervision and support during the thesis.
  • Access to the LRZ cloud, AWS, and GCP.
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de

Characterization of Benchmarks

Description:
Benchmarks are an essential tool for performance assessment of HPC systems. During the pro-
curement process of HPC systems both benchmarks and proxy applications are used to assess
the system which is to be procured. New generations of HPC systems often serve the current
and evolving needs of the applications for which the system is procured. Therefore, with new
generations of HPC systems, the selected proxy application and benchmarks to assess the sys-
tems’ performance are also selected for the specific needs of the system. Only a few of these
have stayed persistent over longer time periods. At the same time the quality of benchmarks
is typically not questioned as they are seen to only be representatives of specific performance
indicators.

This work aims to provide a more systematic approach with the goal of evaluating benchmarks
targeting the memory subsystem, looking at capacity latency and bandwidth.
 

Problem statement:
How can benchmarks used to assess memory performance, including cache usage, be system-
atically compared amongst each others?

Project Description

 

Description:
Benchmarks are an essential tool for performance assessment of HPC systems. During the
procurement process of HPC systems both benchmarks and proxy applications are used to as-
sess the system which is to be procured. With new generations of HPC systems, the selected
proxy application and benchmarks are often exchanged and benchmarks for specific needs of
the system are selected. Only a few of these have stayed persistent over longer time periods. At
the same time the quality of benchmarks is typically not questioned as they are seen to only be
representatives of specific performance indicators.


This work targets to provide a more systematic approach with the goal of evaluating bench-
marks targeting Network performance, namely regarding MPI (Message Passing Interface) in
both functional test as well as for benchmark applications.


Problem statement:
How can benchmarks used to assess Network performance, using MPI routines, be systemati-
cally compared amongst each others?

Project Description

Description:
Benchmarks are an essential tool for performance assessment of HPC systems. During the pro-
curement process of HPC systems both benchmarks and proxy applications are used to assess
the system which is to be procured. New generations of HPC systems often serve the current
and evolving needs of the applications for which the system is procured. Therefore, with new
generations of HPC systems, the selected proxy application and benchmarks to assess the sys-
tems’ performance are also selected for the specific needs of the system. Only a few of these
have stayed persistent over longer time periods. At the same time the quality of benchmarks
is typically not questioned as they are seen to only be representatives of specific performance
indicators.


This work aims to evaluate benchmarks for input and output (I/O) performance to provide a
systematic approach to evaluate benchmarks targeting read and write performance of different
characteristics as seen in application behavior, mimiced by benchmarks.
 

Problem statement:
How can benchmarks used to assess I/O performance be systematically compared amongst
each others?

Thesis Description

Resource Management for Supercomputing

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Research Summary:

In this work, we will focus on co-scheduling and power management on HPC systems, with a particular focus on heterogeneous computing nodes, consisting of multiple different processors (CPU, GPU, etc.) or memory technologies (DRAM, NVRAM, etc.). Recent hardware components generally support a variety of resource partitioning and power control features, such as bandwidth partitioning, compute resource partitioning, clock scaling, power capping, and others. Our goal in this study is to provide a sophisticated mechanism to comprehensively optimize these various hardware setups, as well as the selection of co-locating jobs from a given job set, so that a given objective function (e.g., total throughput) is maximized. For this, we will develop the followings: (1) several models (could be based on machine learning) to predict power, performance, interference, etc., as functions of hardware setups and a set of co-located jobs; (2) algorithms to optimize the hardware setups and the job selections from a job queue based on the developed models.

Notes:

  • Due to the time limitation, you may tackle a subproblem, such as optimizing resource partitioning on GPUs (e.g., A100), power budgeting across different components, or developing a hardware agnostic power/performance modeling, however all of which would be ultimately a great contribution to the project.
  • There is no requirement for this topic, but parallel programming and GPU experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English. 

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improve the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on co-scheduling and power management on HPC clusters, mainly from the job scheduler side (i.e., Slurm, https://slurm.schedmd.com) and will firstly examine a variety of features supported by the current production-level software stack (i.e., Slurm plus several extensions) on a real hardware. Then, the next step will be one or more of the followings depending on your preferences: (1) list all the missing pieces in the software stack to realize sophisticated co-scheduling and power management features, and then provide architecture-level solutions to realize them; (2) pick up one (or more) of the missing features, and extend the existing software stack to support it; or (3) propose a job scheduling algorithm to fully exploit the currently supported co-scheduling or power management features (or your newly implemented ones). If necessary, we will use also job scheduling simulators to test our ideas.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on some workflow engines (e.g., Melissa, https://gitlab.inria.fr/melissa/melissa) and our resource management software stack (incl. Slurm, https://slurm.schedmd.com), and explore the benefits of coordinating them to improve total system throughput, energy efficiency, or other aspects. These workflow engines are useful for running scientific simulations efficiently while changing inputs, conditions, parameters, etc., and Melissa in particular supports several advanced features such as fault tolerance, automatic concurrency handling, and online neural network training. Our goals in this study are: (1) optimizing job scheduling and power/resource management while being explicitly aware of the behavior and characteristics of such workflow-based jobs; and (2) interacting with the workflow engine accordingly and providing a right interface to them for this purpose.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Dynamic Resource Management for HPC systems

Resource Management on HPC systems needs to become more dynamic to fully exploit the resources of modern and future large-scale HPC systems. Contrary to static resource management, with dynamic resource management the resource assignments of applications are not fixed a-priori and can change during their execution. To enable this approach on HPC system research is required across the full system software stack, including applications, programming models, resource management & scheduling.

Various student projects related to Dynamic Resource Management for HPC systems are available, including (but not limited to!) the topics listed below.

If you are interested in one of the listed topic or Dynamic Resource Management in general, please send an email to: Dominik Huber (domi.huber@tum.de)

Background: Dynamic Resource Management promises better utilization of resources in HPC systems. However, this requires significant changes to current scheduling strategies as the dynamically varying resource requirements and utilizations of running applications needs to be taken into account.  So far, there exists only limited knowledge about efficient scheduling strategies in such scenerios.

Thesis Goal: The goal of this thesis is to explore new scheduling strategies in a scenerio of dynamic resource management. To this end, a python based mini-scheduler needs to be implemented to test different scheduling strategies based on application-provided performance models.

Contact: Dominik Huber (domi.huber@tum.de), Prof. Martin Schulz, Prof. Martin Schreiber

Background: Scheduling resources on systems with dynamic resource management is a challenging task. One important aspect in this context is the description of the dynamic resource requirements and performance behavior of applications as input to dynamic scheduling strategies.

Thesis Goal: The goal of this thesis is to develop a Domain Specific Language to express dynamic resource requirements of HPC applications. A next step would then be the development of scheduling strategies based on the provided data.

Contact: Dominik Huber (domi.huber@tum.de), Prof. Martin Schulz, Prof. Martin Schreiber

Background: Many approaches for dynamic resource management are tailored to specific programming models. To account for the diversity of programming models used by malleable applications, a general, programming model-agnostic abstraction for the description of (dynamic) resources is desirable.

Thesis Goal: The goal of this thesis is to develop and emulate such an emulator for this (resource-)sets approach (e.g. in Python) and to demonstrate its usability with different models of application use cases

Contact: Dominik Huber (domi.huber@tum.de), Prof. Martin Schulz, Prof. Martin Schreiber

Background: ExaMPI is an MPI implementation developed at the University of Tennessee at Chattanooga with the goal of enabling rapid prototyping of new MPI ideas.

Thesis Goal: This thesis will focus on prototyping ideas for dynamic MPI (such as MPI Sessions and/or PMIx) with the ExaMPI implementation.

Contact: Dominik Huber (domi.huber@tum.de), Prof. Martin Schulz, Prof. Martin Schreiber

Background: Flex-MPI is a library implemented on top of MPICH and provides dynamic load balancing and malleability capabilities to MPI applications. Extensions are needed to make Flex-MPI compatible with MPI Sessions and to facilitate portable integration into HPC software stacks.

Thesis Goal: This thesis will focus on integrating MPI Sessions and/or PMIx into the FlexMPI library to improve its flexibility and interoperability.

Contact: Dominik Huber (domi.huber@tum.de), Prof. Martin Schulz, Prof. Martin Schreiber

Memory Management and Optimizations on Heterogeneous HPC Architectures

Background

sys-sage(https://github.com/caps-tum/sys-sage) is a library for capturing and manipulating hadrware topology of compute systems, and their attributes. It collects, stores, and provides different kinds of information regarding an HPC node, heterogeneous chips, such as CPUs of GPUs, or their components, such as caches, cores or thread blocks. This information is needed by various different users in the areas of scheduling, power management or performance optimizations, to name a few examples.

On the other hand, supercomputers consist of many nodes with identical hardware and characteristics, and therefore, one would expect that the performance-related attributes will be the same (or within of a margin of error). This is a crucial quality of HPC nodes because very often, the tasks are split equally among the nodes; so if one is slower, the whole execution is slowed down. This can happen due to many factors, such as hardware faults, poor system design, misconfiguration of a node, not cleaning previous jobs properly. We want to be able to test such HPC systems to confirm the uniformity or to identify the outlier, so that it can be fixed.

Thesis

In this thesis, we will focus on extending sys-sage with a mechanism to compare selected attributes of different components (HPC nodes, CPU sockets etc.). The collection of the relevant data is already handled by sys-sage. The goal of this thesis is to design and develop an additional functionality that can compare the acquired data (from multiple nodes) and automatically evaluate them, so that outliers can be identified. This will be done mainly on a node granularity but the functionality should be extendable to any granularity, such as socket or NUMA.

Tasks

The expected workflow is as follows:

  1. Extend the existing sys-sage XML export with the counterpart import, so that data from multiple nodes can be put together.
  2. Create an interface to specify parameters of the comparison.
  3. Test the functionality on real production HW, such as the SuperMUC-NG supercomputer.

Contact
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz) and attach your CV & transcript of records.

Published on 12.09.2022
 

Background:

The DEEP-SEA(https://www.deep-projects.eu) project is a joint European effort of ca. a dozen leading universities and research institutions on developing software for coming Exascale supercomputing architectures. CAPS TUM, as a member of the project, is responsible for development of an environment for analyzing application and system performance in terms of data movements. Data movements are very costly compared to computation capabilities. Therefore, suboptimal memory access patterns in an application can have a huge negative impact on the overall performance. Contrarily, analyzing and optimizing the data movements can increase the overall performance of parallel applications massively.

We develop a toolchain with the goal to create a full in-depth analysis of a memory-related application behaviour. It consists of tools Mitos(https://github.com/caps-tum/mitos), sys-sage(https://github.com/caps-tum/sys-sage), and MemAxes(https://github.com/caps-tum/MemAxes). Mitos collects the information about memory accesses, sys-sage captures the memory and compute topology and capabilities of a system, and provides a link between the hardware and the performance data, and finally, MemAxes analyzes and visualizes outputs of the aforementioned projects.

There is an existing PoC of these tools, and we plan on extending and improving the projects massively to fit the needs of state-of-the-art and future HPC systems, which are expected to be the core of upcoming Exascale supercomputers. Our work and research touches modern heterogeneous architectures, patterns, and designs, and aims at enabling the users to run extreme-scale applications with utilizing as much of the underlying hardware as possible.

Context:

  • The current implementation of Mitos/MemAxes collects PEBS samples of memory accesses (via perf), i.e. every n-th memory operation is measured and stored.
  • Collecting aggregate data alongside with PEBS samples could help increase the overall understanding of the system and application behaviour.

Tasks/Goals: 

  • Analyse what aggregate data are meaningful and possible to collect (total traffic, BW utilization, num LD/ST, ...?) and how to collect them (papi? likwid? perf?)
  • Ensure that these measurements don't interfere with the existing collection of PEBS samples.
  • Design and implement a low-overehad solution.
  • Find a way to visualise/present the data in MemAxes tool (or different visualisation tool if MemAxes is not suitable.
  • Finally, present how the newly collected data help the users to understand the system or hint the user if/how to do optimizations.

Contact:

In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).
 

Updated on 12.09.2022

Various MPI-Related Topics

Please Note: MPI is a high performance programming model and communication library designed for HPC applications. It is designed and standardised by the members of the MPI-Forum, which includes various research, academic and industrial institutions. The current chair of the MPI-Forum is Prof. Dr. Martin Schulz.  The following topics are all available as Master's Thesis and Guided Research. They will be advised and supervised by Prof. Dr. Martin Schulz himself, with help of researches from the chair. If you are very familiar with MPI and parallel programming, please don't hesitate to drop a mail to Prof. Dr. Martin Schulz.  These topics are mostly related to current research and active discussions in the MPI-Forum, which are subject of standardisation in the next years. Your contribution achieved in these topics may make you become contributor to the MPI-Standard, and your implementation may become a part of the code base of OpenMPI. Many of these topics require a collaboration with other MPI-Research bodies, such as the Lawrence Livermore National Laboratories and Innovative Computing Laboratory. Some of these topics may require you to attend MPI-Forum Meetings which is at late afternoon (due to time synchronisation worldwide). Generally, these advanced topics may require more effort to understand and may be more time consuming - but they are more prestigious, too. 

LAIK is a new programming abstraction developed at LRR-TUM

  • Decouple data decompositionand computation, while hiding communication
  • Applications work on index spaces
  • Mapping of index spaces to nodes can be adaptive at runtime
  • Goal: dynamic process management and fault tolerance
  • Current status: works on standard MPI, but no dynamic support

Task 1: Port LAIK to Elastic MPI

  • New model developed locally that allows process additions and removal
  • Should be very straightforward

Task 2: Port LAIK to ULFM

  • Proposed MPI FT Standard for “shrinking” recovery, prototype available
  • Requires refactoring of code and evaluation of ULFM

Task 3: Compare performance with direct implementations of same models on MLEM

  • Medical image reconstruction code
  • Requires porting MLEM to both Elastic MPI and ULFM

Task 4: Comprehensive Evaluation

ULFM (User-Level Fault Mitigation) is the current proposal for MPI Fault Tolerance

  • Failures make communicators unusable
  • Once detected, communicators an be “shrunk”
  • Detection is active and synchronous by capturing error codes
  • Shrinking is collective, typically after a global agreement
  • Problem: can lead to deadlocks

Alternative idea

  • Make shrinking lazy and with that non-collective
  • New, smaller communicators are created on the fly

Tasks:

  • Formalize non-collective shrinking idea
  • Propose API modifications to ULFM
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI forum

ULFM works on the classic MPI assumptions

  • Complete communicator must be working
  • No holes in the rank space are allowed
  • Collectives always work on all processes

Alternative: break these assumptions

  • A failure creates communicator with a hole
  • Point to point operations work as usual
  • Collectives work (after acknowledgement) on reduced process set

Tasks:

  • Formalize“hole-y” shrinking
  • Proposenew API
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

With MPI 3.1, MPI added a second tools interface: MPI_T

  • Access to internal variables 
  • Query, read, write
  • Performance and configuration information
  • Missing: event information using callbacks
  • New proposal in the MPI Forum (driven by RWTH Aachen)
  • Add event support to MPI_T
  • Proposal is rather complete

Tasks:

  • Implement prototype in either Open MPI or MVAPICH
  • Identify a series of events that are of interest
  • Message queuing, memory allocation, transient faults, …
  • Implement events for these through MPI_T
  • Develop tool using MPI_T to write events into a common trace format
  • Performance evaluation

Possible collaboration with RWTH Aachen

 

PMIxis a proposed resource management layer for runtimes (for Exascale)

  • Enables MPI runtime to communicate with resource managers
  • Come out of previous PMI efforts as well as the Open MPI community
  • Under active development / prototype available on Open MPI

Tasks: 

  • Implement PMIx on top of MPICH or MVAPICH
  • Integrate PMIx into SLURM
  • Evaluate implementation and compare to Open MPI implementation
  • Assess and possible extend interfaces for tools 
  • Query process sets

MPI was originally intended as runtime support not as end user API

  • Several other programming models use it that way
  • However, often not first choice due to performance reasons
  • Especially task/actor based models require more asynchrony

Question: can more asynchronmodels be added to MPI

  • Example: active messages

Tasks:

  • Understand communication modes in an asynchronmodel
  • Charm++: actor based (UIUC)•Legion: task based (Stanford, LANL)
  • Propose extensions to MPI that capture this model better
  • Implement prototype in Open MPI or MVAPICH
  • Evaluation and Documentation

Possible collaboration with LLNL and/or BSC

MPI can and should be used for more than Compute

  • Could be runtime system for any communication
  • Example: traffic to visualization / desktops

Problem:

  • Different network requirements and layers
  • May require different MPI implementations
  • Common protocol is unlikely to be accepted

Idea: can we use a bridge node with two MPIs linked to it

  • User should see only two communicators, but same API

Tasks:

  • Implement this concept coupling two MPIs
  • Open MPI on compute cluster and TCP MPICH to desktop
  • Demonstrate using on-line visualization streaming to front-end
  • Document and provide evaluation
  • Warning: likely requires good understanding of linkers and loaders

Field-Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) are considered to be the next generation of accelerators. Their advantages reach from improved energy efficiency for machine learning to faster routing decisions in network controllers. If you are interested in one of it, please send your CV and transcript record to the specified Email address.

Our chair offers various topics available in this area:

  • Direct network operations: Here, FPGAs are wired closer to the networking hardware itself, hence allows to overcome the network stack which a regular CPU-style communication would be exposed to. Your task would be to investigate FPGAs which can interact with the network closer than CPU-based approaches. ( martin.schreiber@tum.de )
  • Linear algebra: Your task would be to explore strategies to accelerate existing linear algebra routines on FPGA systems by taking into account applications requirements. ( martin.schreiber@tum.de )
  • Varying accuracy of computations: The granularity of current floating-point computations is 16, 32, or 64 bit. Your work would be on tailoring the accuracy of computations towards what's really required. ( martin.schreiber@tum.de )
  • ODE solver: You would work on an automatic toolchain for solving ODEs originating from computational biology. ( martin.schreiber@tum.de )

 

Various Thesis Topics in Collaboration with Leibniz Supercomputing Centre

Contact: amir.raoofy@lrz.de and josef.weidendorfer@lrz.de

Contact: michael.ott@lrz.de and amir.raoofy@lrz.de 

contact: amir.raoofy@lrz.de

Contact: amir.raoofy@lrz.de

Contact: amir.raoofy@lrzd.e and josef.weidendorfer.lrz.de

Applied mathematics & high-performance computing

There are various topics available in the area bridging applied mathematics and high-performance computing. Please note that this will be supervised externally by Prof. Dr. Martin Schreiber (a former member of this chair, now at Université Grenoble Alpes).

This is just a selection of some topics to give some inspiration:

(MA=Master in Math/CS, CSE=Comput. Sc. and Engin.)

  • HPC tools:
    • Automated Application Performance Characteristics Extraction
    • Portable performance assessment for programs with flat performance profile, BA, MA, CSE
  • Projects targeting Weather (and climate) forecasting
    • Implementation and performance assessment of ML-SDC/PFASST in OpenIFS (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Efficient realization of fast Associated Legendre transformations on GPUs (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Fast exponential and implicit time integration, BA, MA, CSE
    • MPI parallelization for the SWEET research software, MA, CSE
    • Semi-Lagrangian methods with Parareal, CSE, MA
    • Non-interpolating Semi-Lagrangian Schemes, CSE, MA
    • Time-splitting methods for exponential integrators, CSE, MA
    • Machine learning for non-linear time integration, CSE, MA
    • Exponential integrators and higher-order Semi-Lagrangian methods

  • Ocean simulations:
    • Porting the NEMO ocean simulation framework to GPUs with a source-to-source compiler
    • Porting the Croco ocean simulation framework to GPUs with a source-to-source compiler
       
  • Health science project: Biological parameter optimization
    • Extending a domain-specific language with time integration methods
    • Performance assessment and improvements for different hardware backends (GPUs / FPGAs / CPUs)

If you're interested in any of these projects or if you search for projects in this area, please drop me an Email for further information

In-Situ/In-Transit Data Transformation Using Low-Power Processors