Open Thesis

Distributed Deep Learning for Video Analytics

Stichworte:
Distributed Deep Learning, Distributed Computing, Video Analytics, Edge Computing, Edge AI

Beschreibung

    In recent years, deep learning-based algorithms have demonstrated superior accuracy in video analysis tasks, and scaling up such models; i.e., designing and training larger models with more parameters, can improve their accuracy even more.

    On the other hand, due to strict latency requirements as well as privacy concerns, there is a tendency towards deploying video analysis tasks close to data sources; i.e., at the edge. However, compared to dedicated cloud infrastructures, edge devices (e.g., smartphones and IoT devices) as well as edge clouds are constrained in terms of compute, memory and storage resources, which consequently leads to a trade-off between response time and accuracy. 

    Considering video analysis tasks such as image classification and object detection as the application at the heart of this project, the goal is to evaluate different deep learning model distribution techniques for a scenario of interest.

Betreuer:

Navidreza Asadi

Edge AI in Adversarial Environment: A Simplistic Byzantine Scenario

Stichworte:
Distributed Deep Learning, Distributed Computing, Byzantine Attack, Adversarial Inference

Beschreibung

This project considers an environment consisting of several low performance machines which are connected together across a network. 

Edge AI has drawn the attention of both academia and industry as a way to bring intelligence to edge devices to enhance data privacy as well as latency. 

Prior works investigated on improving accuracy-latency trade-off of Edge AI by distributing a model into multiple available and idle machines. Building on top of those works, this project adds one more dimension: a scenario where $f$ out of $n$ contributing nodes are adversary. 

Therefore, for each data sample an adversary (1) may not provide an output (can also be considered as a faulty node.) or (2) may provide an arbitrary (i.e., randomly generated) output.

The goal is to evaluate robustness of different parallelism techniques in terms of achievable accuracy in presence of malicious contributors and/or faulty nodes.

Note that contrary to the mainstream existing literature, this project mainly focuses on the inference (i.e., serving) phase of deep learning algorithms, and although robustness of the training phase can be considered as well, it has a much lower priority.

Betreuer:

Navidreza Asadi

On the Efficiency of Deep Learning Parallelism Schemes

Stichworte:
Distributed Deep Learning, Parallel Computing, Inference, AI Serving

Beschreibung

Deep Learning models are becoming increasingly larger so that most of the state-of-the-art model architectures are either too big to be deployed on a single machine or cause performance issues such as undesired delays.

This is not only true for the largest models being deployed in high performance cloud infrastructures but also for smaller and more efficient models that are designed to have fewer parameters (and hence, lower accuracy) to be deployed on edge devices.

    That said, this project considers the second environment where there are multiple resource constrained machines connected through a network. 

    Continuing the research towards distributing deep learning models into multiple machines, the objective is to generate more efficient variants/submodels compared to existing deep learning parallelism algorithms.  

Note that this project mainly focuses on the inference (i.e., serving) phase of deep learning algorithms, and although efficiency of the training phase can be considered as well, it has a much lower priority.

Betreuer:

Navidreza Asadi

Optimizing Communication Efficiency of Deep Learning Parallelism Techniques in the Inference Phase

Stichworte:
Distributed Deep Learning, Parallel Computing, Inference, Communication Efficiency

Beschreibung

Deep Learning models are becoming increasingly larger so that most of the state-of-the-art model architectures are either too big to be deployed on a single machine or cause performance issues such as undesired delays. 

This is not only true for the largest models being deployed in high performance cloud infrastructures but also for smaller and more efficient models that are designed to have fewer parameters (and hence, lower accuracy) to be deployed on edge devices.

That said, this project considers the second environment where there are multiple resource constrained machines connected through a network. 

When distributing deep learning models across multiple compute nodes, trying to realize parallelism, certain algorithms (e.g., Model Parallelism) are not able to achieve the desired performance in terms of latency, mainly due to (1) communication cost of intermediate tensors; and (2) inter-operator blocking.

This project consists of multiple sub-projects each can be taken separately.

In the context of Model Parallelism, two potential modifications can be considered: 

  • Pipeline parallelism by delaying the inference of the first few data samples assuming a live stream of input data.
  • Finding certain points in deep learning architectures or modifying the architecture itself so that for each data sample, it becomes possible to filter out some sub-parts of the model, and therefore reducing the transmitted data, and still achieve comparable accuracy.

Class and Variant Parallelism improve inter-node communication significantly. However, the input data needs to be shared between contributing nodes. The goal is to propose a technique to transmit less data, and to find a good trade-off between computation and communication.

Note that this project mainly focuses on the inference (i.e., serving) phase of deep learning algorithms, and although efficiency of the training phase can be considered as well, it has a much lower priority.

Betreuer:

Navidreza Asadi

Load Generation for Benchmarking Kubernetes Autoscaler

Stichworte:
Horizontal Pod Autoscaler (HPA), Kubernetes (K8s), Benchmarking

Beschreibung

Kubernetes (K8s) has become the de facto standard for orchestrating containerized applications. K8s is an open-source framework which among many features, provides automated scaling and management of services. 

Considering a microservice-based architecture, where each application is composed of multiple independent services (usually each service provides a single functionality), K8s' Horizontal Pod Autoscaler (HPA) can be leveraged to dynamically change the number of  instances (also known as Pods) based on workload and incoming request pattern.

The main focus of this project is to benchmark the HPA behavior of a Kubernetes cluster running a microservice-based application having multiple services chained together. That means, there is a dependency between multiple services, and by sending a request to a certain service, other services might be called once or multiple times.

This project aims to generate incoming request load patterns that lead to an increase in either the operational cost of the Kubernetes cluster or response time of the requests. This potentially helps to identify corner cases of the algorithm and/or weak spots of the system; hence called adversarial benchmarking.

The applications can be selected from commonly used benchmarks such as DeathStarBench*. The objective is to investigate on the dependencies between services and how different sequences of incoming request patterns can affect each service as well as the whole system.

* https://github.com/delimitrou/DeathStarBench/blob/master/hotelReservation/README.md

Betreuer:

Navidreza Asadi