"HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences" got accepted at Euro-Par 2025

Authors: Jianfeng Gu, Puxuan Wang, Isaac David Núñez Araya, Kai Huang and Michael Gerndt

Serverless Computing (FaaS) has become a popular paradigm for deep learning inference due to the ease of deployment and pay-peruse benefits. However, current serverless inference platforms encounter the coarse-grained and static GPU resource allocation problems during scaling, which leads to high costs and Service Level Objective (SLO) violations in fluctuating workloads. Meanwhile, current platforms only support horizontal scaling for GPU inferences, thus the cold start problem further exacerbates the problems.

In this paper, we propose HAS-GPU, an efficient Hybrid Auto-scaling Serverless architecture with fine-grained GPU allocation for deep learning inferences. HAS-GPU proposes an agile scheduler capable of allocating GPU Streaming Multiprocessor (SM) partitions and time quotas with arbitrary granularity and enables significant vertical quota scalability at runtime. To resolve performance uncertainty introduced by massive fine-grained resource configuration spaces, we propose the Resource-aware Performance Predictor (RaPP). Furthermore, we present an adaptive hybrid auto-scaling algorithm with both horizontal and vertical scaling to ensure inference SLOs and minimize GPU costs.

The experiments demonstrated that compared to the mainstream serverless inference platform, HAS-GPU reduces function costs by an average of 10.8x with better SLO guarantees. Compared to state-of-the-art spatio-temporal GPU sharing serverless framework, HAS-GPU reduces function SLO violation by 4.8x and cost by 1.72x on average.

◄ Zurück zu: Aktuelles

To top

Informatik 10 - Lehrstuhl für Rechnerarchitektur & Parallele Systeme

Prof. Dr. Martin Schulz
schulzm(at)in.tum.de

Prof. Dr. Michael Gerndt
gerndt(at)in.tum.de

Prof. Dr.-Ing. Carsten Trinitis
Carsten.Trinitis(at)tum.de

Adresse:
Technische Universität München
Boltzmannstraße 3
85748 Garching
Deutschland

Sekretariat:
Raum 01.04.40
Tel.: +49 89 289-17659
Fax: +49 89 289-17662

Intranet