Most energy efficient Core on a private Telco Cloud: Energy optimized redundancy model for telco applications
Kubernetes, Energy Efficiency, 5G Core Network
Beschreibung
Motivation:
Deutsche Telekom is operating and constantly developing and improving it’s own cloud to operate internet and telephony services (Telco applications). The Kubernetes-based cloud and the Telco applications are combined to form a TaaP - Telco as a Platform. The TaaP are thousands of servers and hundreds of applications. The energy efficiency of the TaaP is a key success criterion for optimizing costs, energy consumption, and carbon emissions. Hence, a concept of Full Stack Energy Management is established. The focus is to jointly optimize hardware, software, and services towards energy efficiency without affecting service availability and robustness.
Problem & Challenge:
In the Telco industry, so far, HW redundancy has been the baseline for service robustness and resilience. The introduction of virtualization and containerization concepts resulted in an additional redundancy level above the hardware. Classical redundancy models no longer apply to this multi-layer Software redundancy. Moreover, so far, there is no mathematical model that captures the service availability for these new architectures. For example, a Telco-Service can be provided via 3 data centers with 50 servers each, forming 1 cluster hosting 500 Kubernetes pods. There is a mix of data center redundancy, hardware redundancy in the data centers, and Kubernetes worker node and pod redundancy.
Specific Problem Formulation:
On a TaaP there are multiple layers of redundancy in Hardware and Software. On the one hand, there are multiple site deployments, where each site has multiple hundreds of servers. On the other hand, on each site, each server has multiple redundant hardware parts, like the power supply. Moreover, a Kubernetes Cluster, which is homed on one site, hosts multiple microservices, each with a different redundancy concept like active/passive, n+1, n+m, etc. This setup of mixed HW and SW redundancy causes inefficiency and is not easy to calculate or simulate in terms of overall service availability, network, site, redundancy, and energy consumption.
Solution Approach:
There are multiple different parameters in HW and SW that impact the service availability and energy consumption. Firstly, a comprehensive list of these parameters is required, including modeling of dependencies. Secondly, a mathematical model needs to be set up that considers all of these parameters in "one equation". Thirdly, a graphical simulation should be set up to demonstrate the dependencies and results.
Expected Outcome:
A simulation and mathematical model should be developed that considers software and hardware redundancy across multiple sites and SW layers to calculate the network-wide service availability. This is key in order to further optimize the HW and SW footprint and improve sustainability. Moreover, the model should allow the optimization of the following parameters: least required HW based on predefined service availability, least energy consumption, and best redundancy.
Working on the live network or setting up a lab deployment is out of scope of this thesis. The focus of this thesis are the modeling and simulation of the deployment.
Voraussetzungen
- Proficiency in English. The project language is English, and the team spans across four EU countries.
- Advanced Kubernetes Knowhow.
- Basic knowledge about 5G Telco networks.
- Familiarity with tools such as GitLab and Wiki platforms.
- High level of self-engagement and motivation.
Kontakt
- Manuel Keipert (manuel.keipert@telekom.de)
- Valentin Haider (valentin.haider@tum.de)
- Razvan-Mihai Ursu (razvan.ursu@tum.de)