Open Theses

Important remark on this page

The following list is by no means exhaustive or complete. There is always some student work to be done in various research projects, and many of these projects are not listed here.

Don't hesitate to drop an email to any member of the chair asking for currently available topics in their field of research. Or you can email to this email-ID, which will be automatically broadcasted to all members of the chair.

Also, subscribe to the chair's open thesis topics list. So that when a new topic is posted, you can get a notification. Click here for the subscription.  

Abbreviations:

  • PhD = PhD Dissertation
  • BA = Bachelorarbeit, Bachelor's Thesis
  • MA = Masterarbeit, Master's Thesis
  • GR = Guided Research
  • CSE = Computational Science and Engineering

Cloud Computing / Edge Computing / IoT / Distributed Systems

Function-as-a-Service is emerging as a popular cloud programming paradigm due to its simplicity, client-friendly cost model, and automatic scaling. In FaaS user implements fine-grained functions that are independently packaged and uploaded to a FaaS platform and executed on event triggers such as HTTP requests. On invocation, the FaaS platform is responsible for providing resources to the function and its isolation in ephemeral, stateless containers. A major problem in FaaS is the time to provision a new function instance on a new function invocation request, i.e., cold start that can increase user response time and violate SLO's. Current commercial cloud providers like AWS and Google provide a way to mitigate this problem by keep a certain number of function instances warm to handle future function requests. However, in most cases the number of pre-provisioned function instances are set through an iterative approach which can lead to increased costs and resource over-provisioning. 

Goals:

1. The aim of this work is to implement a tool that predicts the optimal pre-provisioned instances to mitigate function cold starts.

2. Prototype the solution on AWS and GCP.

Requirements

  • Basic knowledge of FaaS platforms. Knowledge of Knative is beneficial.
  • Knowledge of docker, K8s.
  • Cloud Monitoring Solutions such as Prometheus.
  • Load Testing tool such as k6.
  • Forecasting and ML Techniques
  • Tensorflow or PyTorch.

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to different systems required for the work
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de).

Background

With the rise of the adoption of microservice architecture due to its agility, scalability, and resiliency for building the cloud-based applications and their deployment using containerization, DevOps were in demand for handling the development and operations together. However, nowadays serverless computing offers a new way of developing and deploying cloud-native applications. Serverless computing also called NoOps, offloads management and server configuration (operations work) from the user to the cloud provider and lets the user focus only on the product developments. Hence, there are debates regarding which deployment strategy to use.

Goals:

  1. The aim of this work is to  port a suite of microservices applications to serveless architecture, particularly Function-as-a-Service (FaaS).
  2. Comparing the two application deployment architectures wrt scaling, costs.

Requirements

  • Good Knowledge about microservices.
  • Basic knowledge of FaaS platforms. Knowledge of Knative is beneficial.
  • Knowledge of docker, K8s.
  • Load Testing tool such as k6.
  • Knowledge of GKE or AKS.
  • Basic knowledge about gRPC and Thrift.

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to different systems required for the work
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de).

Background: 

This thesis in in collaboration with IfTA GmbH. Details on the thesis can be found on the respective page of IfTA (in german): Masterarbeit Echtzeitfähige Nutzung von mehreren Rechenkernen auf Zynq Ultrascale+ Architektur

Contact: roman.karlstetter@tum.de

 

Background: 

Sensor data streams generate large volumes of time series data. Thus, it's important that data can be accessed quickly and is stored efficiently. As part of the SensE project (see https://sense.caps.in.tum.de ), we are evaluating different ways to improve storage efficiency and performance.

Description: 

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It specifies an encoding useful for storing timestamps for timeseries data, Delta Encoding (DELTA_BINARY_PACKED). However, the C++ implementation in Apache Arrow currently does not support this encoding for writing data yet, and the read path is not fully optimized. Your task is to efficiently implement, optimize and vectorize this delta encoding in the C++ implementation of Apache Parquet inside the Apache Arrow project and evaluate it for different processor architectures.

Your Tasks:

  1. Implement Delta-Encoding write path for Apache Parquet files in the Apache Arrow project (more info: https://parquet.apache.org/docs/file-format/data-pages/encodings/#a-namedeltaencadelta-encoding-delta_binary_packed--5).
  2. Optimize/vectorize both read and write path for Delta-Encoding.
  3. Measure read and write throughput of your implementation on a typical time series dataset, as well as encoding efficiency.

Contact: (roman.karlstetter@tum.de)

 

Supercomputing and Intra-/Inter-Node Resource Management

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Research Summary:

In this work, we will focus on co-scheduling and power management on HPC systems, with a particular focus on heterogeneous computing nodes, consisting of multiple different processors (CPU, GPU, etc.) or memory technologies (DRAM, NVRAM, etc.). Recent hardware components generally support a variety of resource partitioning and power control features, such as bandwidth partitioning, compute resource partitioning, clock scaling, power capping, and others. Our goal in this study is to provide a sophisticated mechanism to comprehensively optimize these various hardware setups, as well as the selection of co-locating jobs from a given job set, so that a given objective function (e.g., total throughput) is maximized. For this, we will develop the followings: (1) several models (could be based on machine learning) to predict power, performance, interference, etc., as functions of hardware setups and a set of co-located jobs; (2) algorithms to optimize the hardware setups and the job selections from a job queue based on the developed models.

Notes:

  • Due to the time limitation, you may tackle a subproblem, such as optimizing resource partitioning on GPUs (e.g., A100), power budgeting across different components, or developing a hardware agnostic power/performance modeling, however all of which would be ultimately a great contribution to the project.
  • There is no requirement for this topic, but parallel programming and GPU experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English. 

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improve the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on co-scheduling and power management on HPC clusters, mainly from the job scheduler side (i.e., Slurm, https://slurm.schedmd.com) and will firstly examine a variety of features supported by the current production-level software stack (i.e., Slurm plus several extensions) on a real hardware. Then, the next step will be one or more of the followings depending on your preferences: (1) list all the missing pieces in the software stack to realize sophisticated co-scheduling and power management features, and then provide architecture-level solutions to realize them; (2) pick up one (or more) of the missing features, and extend the existing software stack to support it; or (3) propose a job scheduling algorithm to fully exploit the currently supported co-scheduling or power management features (or your newly implemented ones). If necessary, we will use also job scheduling simulators to test our ideas.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on some workflow engines (e.g., Melissa, https://gitlab.inria.fr/melissa/melissa) and our resource management software stack (incl. Slurm, https://slurm.schedmd.com), and explore the benefits of coordinating them to improve total system throughput, energy efficiency, or other aspects. These workflow engines are useful for running scientific simulations efficiently while changing inputs, conditions, parameters, etc., and Melissa in particular supports several advanced features such as fault tolerance, automatic concurrency handling, and online neural network training. Our goals in this study are: (1) optimizing job scheduling and power/resource management while being explicitly aware of the behavior and characteristics of such workflow-based jobs; and (2) interacting with the workflow engine accordingly and providing a right interface to them for this purpose.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Memory Management and Optimizations on Heterogeneous HPC Architectures

sys-sage(https://github.com/stepanvanecek/sys-sage) is a library for capturing and manipulating hadrware topology of compute systems, and their attributes. It collects, stores, and provides different kinds of information regarding an HPC node, heterogeneous chips, such as CPUs of GPUs, or their components, such as caches, cores or thread blocks. This information is needed by various different users in the areas of scheduling, power management or performance optimizations, to name a few examples.

In this thesis, we will focus on extending sys-sage with a mechanism to store and upload the system state data. In other words, it should be possible to store all information contained in sys-sage to a file, and to use this file to restore the identical system representation. The tasks would include designing a format that is efficient both in terms of storage and read/write operations complexity. There are multiple use-cases for this added functionality, which include backing up the stored information, transferring the topology from one system (the measured one) to another(developer's workstation), or combining multiple outputs (put together outputs of 10 nodes to create a system-wide topology representation).

Contact:
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz) and attach your CV & transcript of records.

Published on 19.04.2022
 

Background:

The DEEP-SEA project is a joint European effort on developing software for coming exascale supercomputing architectures. CAPS TUM, as a member of the project, collaborates with approximately a dozen leading universities and research institutions from multiple countries throughout Europe. 

In the range of this project, among other tasks, we are responsible for development of an environment for analyzing application and system performance in terms of data movements. Data movements are very costly compared to computation capabilities. Therefore, suboptimal memory access patterns in an application can have a huge negative impact on the overall performance. Contrarily, analyzing and optimizing the data movements can increase the overall performance of parallel applications massively.

We work on several projects with the goal to create a full in-depth analysis of a memory-related application behaviour and capabilities of a system. The main projects are 'Mitos' (originally developed at LLNL), which collects memory access data, 'sys-sage', which captures the memory and compute topology and capabilities of a system, and 'MemAxes' (LLNL), which analyzes and visualizes outputs of the aforementioned projects.

There is an existing PoC of these projects, and we plan on extending and improving the projects massively to fit the needs of state-of-the-art and future HPC systems, which are expected to be the core of upcoming exascale supercomputers. Our work and research touches modern heterogeneous architectures, patterns, and designs, and aims at enabling the users to run extreme-scale applications with utilizing as much of the underlying hardware as possible.

www.deep-projects.eu


github.com/LLNL/MemAxes


github.com/LLNL/Mitos


github.com/stepanvanecek/sys-sage

Context:

  • Current state is that MemAxes was built to visualise data gathered on a single (multi-core) CPU.
  • We want to make our analysis with MemAxes more comprehensive, and therefore want to collect information about multiple chips on the node. On a modern heterogeneous node, there can be multiple CPUs, GPUs, or possibly FPGAs; in the future, also network cards may become of our interest. Moreover, all components are connected together in some way.

Tasks/Goals: 

  • Adapt MemAxes to support visualisations for heterogeneous nodes (Nodes containing multiple CPUs and GPUs. A universal solution that would fit other types of chips would be nice, but is not a must.)
  • As there will be more information available than what is possible to fit on one screen, you should come up with a design that offers both an overview and enough detail (e.g. zooming in/out to different parts of the node)
  • -> Create an intuitive concept and adapt the visualisation, user interface and data/aggregates being presented based what parts of system are currently being presented. (Or provide an alternative solution that is intuitive and clear.)

Contact:

In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz) and attach your CV & transcript of records.

Updated on 08.03.2022 (6)

Background:
The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

  • The current implementation of Mitos/MemAxes collects PEBS samples of memory accesses (via perf), i.e. every n-th memory operation is measured and stored.
  • Collecting aggregate data alongside with PEBS samples could help increase the overall understanding of the system and application behaviour.

Tasks/Goals: 

  • Analyse what aggregate data are meaningful and possible to collect (total traffic, BW utilization, num LD/ST, ...?) and how to collect them (papi? likwid? perf?)
  • Ensure that these measurements don't interfere with the existing collection of PEBS samples.
  • Design and implement a low-overehad solution.
  • Find a way to visualise/present the data in MemAxes tool (or different visualisation tool if MemAxes is not suitable.
  • Finally, present how the newly collected data help the users to understand the system or hint the user if/how to do optimizations.

Contact:

In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).
 

Updated on 19.04.2022

Background:

The DEEP-SEA project is a joint European effort on developing software for coming exascale supercomputing architectures. CAPS TUM, as a member of the project, collaborates with approximately a dozen leading universities and research institutions from multiple countries throughout Europe. 

In the range of this project, among other tasks, we are responsible for development of an environment for analyzing application and system performance in terms of data movements. Data movements are very costly compared to computation capabilities. Therefore, suboptimal memory access patterns in an application can have a huge negative impact on the overall performance. Contrarily, analyzing and optimizing the data movements can increase the overall performance of parallel applications massively.

We work on several projects with the goal to create a full in-depth analysis of a memory-related application behaviour and capabilities of a system. The main projects are 'Mitos' (originally developed at LLNL), which collects memory access data, 'sys-sage', which captures the memory and compute topology and capabilities of a system, and 'MemAxes' (LLNL), which analyzes and visualizes outputs of the aforementioned projects.

There is an existing PoC of these projects, and we plan on extending and improving the projects massively to fit the needs of state-of-the-art and future HPC systems, which are expected to be the core of upcoming exascale supercomputers. Our work and research touches modern heterogeneous architectures, patterns, and designs, and aims at enabling the users to run extreme-scale applications with utilizing as much of the underlying hardware as possible.

www.deep-projects.eu


github.com/LLNL/MemAxes


github.com/LLNL/Mitos


github.com/stepanvanecek/sys-sage

Context:

  • Intel PEBS (Precise event based sampling) enables memory access data collection on modern Intel CPUs, which is used by Mitos(https://github.com/LLNL/Mitos) project to collect data access samples for MemAxes.
  • AMD's IBS should offer similar functionality, however it is not supported by Mitos at the moment.
  • In terms of increasing the versatility of Mitos/MemAxes projects, its functionality should not be limited only to Intel CPUs.

Tasks/Goals: 

  • Investigate on the functionality of IBS and find out whether it can provide the same data as PEBS for AMD CPUs . If not, research in possible alternatives for supplementing the missing functionality. If IBS provides additional relevant information, it can be proposed how the data could be used by MemAxes tool.
  • Implement memory access data collection on AMD CPUs and include it in Mitos project.
  • Design logic in Mitos that automatically switches between AMD and Intel processors so that the same code can be compiled and run on both platforms.
  • Present the data collected by an example application in MemAxes.
  • GR: This topic can be adapted as a literature survey and PoC/experimental development only to keep the scope feasible.

Contact:
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).

Updated on 19.04.2022

Various MPI-Related Topics

Please Note: MPI is a high performance programming model and communication library designed for HPC applications. It is designed and standardised by the members of the MPI-Forum, which includes various research, academic and industrial institutions. The current chair of the MPI-Forum is Prof. Dr. Martin Schulz.  The following topics are all available as Master's Thesis and Guided Research. They will be advised and supervised by Prof. Dr. Martin Schulz himself, with help of researches from the chair. If you are very familiar with MPI and parallel programming, please don't hesitate to drop a mail to Prof. Dr. Martin Schulz.  These topics are mostly related to current research and active discussions in the MPI-Forum, which are subject of standardisation in the next years. Your contribution achieved in these topics may make you become contributor to the MPI-Standard, and your implementation may become a part of the code base of OpenMPI. Many of these topics require a collaboration with other MPI-Research bodies, such as the Lawrence Livermore National Laboratories and Innovative Computing Laboratory. Some of these topics may require you to attend MPI-Forum Meetings which is at late afternoon (due to time synchronisation worldwide). Generally, these advanced topics may require more effort to understand and may be more time consuming - but they are more prestigious, too. 

LAIK is a new programming abstraction developed at LRR-TUM

  • Decouple data decompositionand computation, while hiding communication
  • Applications work on index spaces
  • Mapping of index spaces to nodes can be adaptive at runtime
  • Goal: dynamic process management and fault tolerance
  • Current status: works on standard MPI, but no dynamic support

Task 1: Port LAIK to Elastic MPI

  • New model developed locally that allows process additions and removal
  • Should be very straightforward

Task 2: Port LAIK to ULFM

  • Proposed MPI FT Standard for “shrinking” recovery, prototype available
  • Requires refactoring of code and evaluation of ULFM

Task 3: Compare performance with direct implementations of same models on MLEM

  • Medical image reconstruction code
  • Requires porting MLEM to both Elastic MPI and ULFM

Task 4: Comprehensive Evaluation

ULFM (User-Level Fault Mitigation) is the current proposal for MPI Fault Tolerance

  • Failures make communicators unusable
  • Once detected, communicators an be “shrunk”
  • Detection is active and synchronous by capturing error codes
  • Shrinking is collective, typically after a global agreement
  • Problem: can lead to deadlocks

Alternative idea

  • Make shrinking lazy and with that non-collective
  • New, smaller communicators are created on the fly

Tasks:

  • Formalize non-collective shrinking idea
  • Propose API modifications to ULFM
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI forum

ULFM works on the classic MPI assumptions

  • Complete communicator must be working
  • No holes in the rank space are allowed
  • Collectives always work on all processes

Alternative: break these assumptions

  • A failure creates communicator with a hole
  • Point to point operations work as usual
  • Collectives work (after acknowledgement) on reduced process set

Tasks:

  • Formalize“hole-y” shrinking
  • Proposenew API
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

With MPI 3.1, MPI added a second tools interface: MPI_T

  • Access to internal variables 
  • Query, read, write
  • Performance and configuration information
  • Missing: event information using callbacks
  • New proposal in the MPI Forum (driven by RWTH Aachen)
  • Add event support to MPI_T
  • Proposal is rather complete

Tasks:

  • Implement prototype in either Open MPI or MVAPICH
  • Identify a series of events that are of interest
  • Message queuing, memory allocation, transient faults, …
  • Implement events for these through MPI_T
  • Develop tool using MPI_T to write events into a common trace format
  • Performance evaluation

Possible collaboration with RWTH Aachen

 

PMIxis a proposed resource management layer for runtimes (for Exascale)

  • Enables MPI runtime to communicate with resource managers
  • Come out of previous PMI efforts as well as the Open MPI community
  • Under active development / prototype available on Open MPI

Tasks: 

  • Implement PMIx on top of MPICH or MVAPICH
  • Integrate PMIx into SLURM
  • Evaluate implementation and compare to Open MPI implementation
  • Assess and possible extend interfaces for tools 
  • Query process sets

MPI was originally intended as runtime support not as end user API

  • Several other programming models use it that way
  • However, often not first choice due to performance reasons
  • Especially task/actor based models require more asynchrony

Question: can more asynchronmodels be added to MPI

  • Example: active messages

Tasks:

  • Understand communication modes in an asynchronmodel
  • Charm++: actor based (UIUC)•Legion: task based (Stanford, LANL)
  • Propose extensions to MPI that capture this model better
  • Implement prototype in Open MPI or MVAPICH
  • Evaluation and Documentation

Possible collaboration with LLNL and/or BSC

MPI can and should be used for more than Compute

  • Could be runtime system for any communication
  • Example: traffic to visualization / desktops

Problem:

  • Different network requirements and layers
  • May require different MPI implementations
  • Common protocol is unlikely to be accepted

Idea: can we use a bridge node with two MPIs linked to it

  • User should see only two communicators, but same API

Tasks:

  • Implement this concept coupling two MPIs
  • Open MPI on compute cluster and TCP MPICH to desktop
  • Demonstrate using on-line visualization streaming to front-end
  • Document and provide evaluation
  • Warning: likely requires good understanding of linkers and loaders

Field-Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) are considered to be the next generation of accelerators. Their advantages reach from improved energy efficiency for machine learning to faster routing decisions in network controllers. If you are interested in one of it, please send your CV and transcript record to the specified Email address.

Our chair offers various topics available in this area:

  • Direct network operations: Here, FPGAs are wired closer to the networking hardware itself, hence allows to overcome the network stack which a regular CPU-style communication would be exposed to. Your task would be to investigate FPGAs which can interact with the network closer than CPU-based approaches. ( martin.schreiber@tum.de )
  • Linear algebra: Your task would be to explore strategies to accelerate existing linear algebra routines on FPGA systems by taking into account applications requirements. ( martin.schreiber@tum.de )
  • Varying accuracy of computations: The granularity of current floating-point computations is 16, 32, or 64 bit. Your work would be on tailoring the accuracy of computations towards what's really required. ( martin.schreiber@tum.de )
  • ODE solver: You would work on an automatic toolchain for solving ODEs originating from computational biology. ( martin.schreiber@tum.de )

 

Various Thesis Topics in Collaboration with Leibniz Supercomputing Centre

As climate change progresses, the intensity and frequency at which forest and vegetation are devastated by fires is growing. Compared to other aspects of climate change, research into wildfire prediction models and spread simulations is still in its infancy.  

An approach that has recently rendered successful consist in the application of Artificial Intelligence (AI) algorithms for building fire propagation models and predictors. These algorithms are fueled by the large volume of today automatically recorded fires, together with manual annotations on fire type, vegetation type, and fire cause.  

As the amount of collected data increases due to the availability of more sensors, satellites capturing images, etc., the considered AI algorithms are expected to proportionally benefit. At the same time, these large volumes of data are expected to pose some challenges on the storage, preprocessing pipeline and inference throughput therefore demanding High Performance Computing (HPC) and High-Performance Data analytics (HPDA) capabilities.  

This master thesis is a collaboration between OroraTech, a Munich-based company, which sees itself in a crucial position to use the data generated by the operational use of the Wildfire System for research purposes, and the Leibniz Supercomputing Centre (LRZ). The aim is to evaluate (and optimize) wildfire prediction models built on large datasets by considering different storage and computing technologies.

Interested students are welcomed to contact Juan J. Durillo (durillo@lrz.de) and Nicolay Hammer (Nicolay.Hammer@lrz.de) for further information.  

Applied mathematics & high-performance computing

There are various topics available in the area bridging applied mathematics and high-performance computing. Please note that this will be supervised externally by Prof. Dr. Martin Schreiber (a former member of this chair, now at Université Grenoble Alpes).

This is just a selection of some topics to give some inspiration:

(MA=Master in Math/CS, CSE=Comput. Sc. and Engin.)

  • HPC tools:
    • Automated Application Performance Characteristics Extraction
    • Portable performance assessment for programs with flat performance profile, BA, MA, CSE
  • Projects targeting Weather (and climate) forecasting
    • Implementation and performance assessment of ML-SDC/PFASST in OpenIFS (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Efficient realization of fast Associated Legendre transformations on GPUs (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Fast exponential and implicit time integration, BA, MA, CSE
    • MPI parallelization for the SWEET research software, MA, CSE
    • Semi-Lagrangian methods with Parareal, CSE, MA
    • Non-interpolating Semi-Lagrangian Schemes, CSE, MA
    • Time-splitting methods for exponential integrators, CSE, MA
    • Machine learning for non-linear time integration, CSE, MA
    • Exponential integrators and higher-order Semi-Lagrangian methods

  • Ocean simulations:
    • Porting the NEMO ocean simulation framework to GPUs with a source-to-source compiler
    • Porting the Croco ocean simulation framework to GPUs with a source-to-source compiler
       
  • Health science project: Biological parameter optimization
    • Extending a domain-specific language with time integration methods
    • Performance assessment and improvements for different hardware backends (GPUs / FPGAs / CPUs)

If you're interested in any of these projects or if you search for projects in this area, please drop me an Email for further information

In-Situ/In-Transit Data Transformation Using Low-Power Processors