Predictive Autoscaling Engine

Project Motivation:

With the computing power and storage becoming a commodity available through Internet, people all over the world gained an opportunity to provide the society with the digital services solving daily issues as well as the highly complex tasks. The technology bringing the unlimited compute power to the society became known as Cloud Computing. Aside from the definite economic benefit when one does not have to buy expensive hardware, cloud computing enables the sustainability - the economy of cloud computing stimulates companies and people to convert from owning the resources to renting them. This paradigm shift reduces the demand for energy. However, with the penetration of cloud computing in every area of life, the significant drawback of the technology became obvious - cloud infrastructure is too complex to operate and is subject to failures. This issue hinders its adoption in such critical domains as healthcare, public transportation, manufacturing, and telecommunication. In this proposal we address the challenge of making the cloud computing-based digital infrastructure highly reliable and fault-tolerant for these critical areas with service level agreements-aware predictive autoscaling both for virtual infrastructure and cloud applications.


The tuning of the cloud application involves two major mechanisms - the scaling of the virtual infrastructure and the scaling of the cloud application that runs on the virtual infrastructure. Scaling encompasses an increase and a decrease in the amount of virtualized resources provided to the cloud application, i.e. new virtual machines might be started in case the demand for the cloud application rises or terminated in case the demand decreases. Analogously, the microservices - building blocks of the modern cloud applications - can also be scaled by starting new instances thereof or terminated the old ones. The adaptation of cloud resources to the demand is represented by these two fundamental actions of increase and decrease in the resources provided. These actions can be conducted either manually or automatically. Automatic scaling (or autoscaling) includes the following types: reactive, scheduled, and predictive. Being the most widely-spread among IaaS and PaaS cloud services providers (CSP), reactive autoscaling automates the scaling of the virtual infrastructure and application by observing the metrics (e.g. CPU utilization) and comparing them versus predefined thresholds. In case the observed value violates the threshold, a scaling action is triggered. Scheduled autoscaling, in turn, follows the predefined schedule of the scaling actions which allows is to cope with the already known load, e.g. during the Christmas holidays. Predictive autoscaling addresses the challenge that remained uncovered by other types of autoscaling - it is impossible to adapt the complex virtual infrastructure and cloud application to the demand instantly. With the scaling process taking some time, there is a high possibility that the availability and the reliability of the cloud application would be affected, what was proved by the preliminary research conducted in the scope of the project [2, 5]. Predictive autoscaling overcomes this challenge to a certain degree by deriving the anticipated future configuration of the virtual infrastructure and cloud application based on the performance model of the application and the predicted demand for the application services. Thus, predictive autoscaling is an enabling technology for the cloud computing in such society-critical areas as healthcare, public transportation, manufacturing, and telecommunication.

Project Description:

The project is aimed at implementing the Predictive Autoscaling Engine for the cloud-based distributed web-applications receiving user requests and processing them in the backend. The software will derive and execute the time-bound sequence of scaling actions that will allow meeting the service-level objectives (SLO) minimizing the cost of used cloud services and resources at the same time by timely provision of the needed cloud capacity. This goal will be achieved through the implementation of the general pipeline including the component responsible for the experiment-based derivation of the performance models in terms of maximal amount of processed requests for individual microservices of the cloud application and virtual machines, the component responsible for forecasting the number of requests that will arrive in the future, the component responsible for the derivation of the scaling policies containing the exact description of each scaling action at the specific point in time, and the component responsible for directly executing the scheduling actions through corresponding application program interfaces (APIs) of the cloud services providers. This pipeline will conduct the sequence of load tests to determine the maximum requests number for each microservice of the cloud application, following with tuning and fitting the forecasting model to the arriving requests rate pattern and deriving the sequence of future scaling actions based on the anticipated number of requests and the capacity of the application and virtual infrastructure. In the end, the execution will be postponed till the time, specified in the scaling policy, comes. The dynamism of the cloud is captured through the adaptation of the models according to the changes in the incoming data and in the structure of the application. Predictive Autoscaling Engine will be compatible with the major IaaS and PaaS public cloud services providers including Amazon, Google, and Microsoft.

The picture shows the high-level architecture of the Predictive Autoscaling Engine pipeline.

Connected research projects:

  • Autoscaling Performance Measurement Tool ScaleX
  • Distributed Platform for Internet of Things

Duration: 2016 - 2020

Team: Vladimir Podolskiy, Anshul Jindal, Atakan Yenel, Yesika Ramirez, Saifeldin Ahmed, Harshit Chopra, Walentin Lamonos.

Funding Sources: German Academic Exchange Service (DAAD), Instana GmbH, Amazon, Google.

Partners: Instana, Huawei, University of Wuerzburg.

Supervision of the project: Prof. Dr. Hans Michael Gerndt

List of Published Papers:

  1. Vladimir Podolskiy, Hans Michael Gerndt, and Shajulin Benedict. 2017. QoS-based Cloud Application Management: Approach and Architecture. In Proceedings of the 4th Workshop on CrossCloud Infrastructures & Platforms (Crosscloud'17). ACM, New York, NY, USA, Article 7, 2 pages. DOI:
  2. Anshul Jindal, Vladimir Podolskiy and Michael Gerndt. 2017. Multilayered Cloud Applications Autoscaling Performance Estimation. In Proceedings of the 2017 IEEE 7th International Symposium on Cloud and Service Computing. IEEE. pp. 24-31. DOI 10.1109/SC2.2017.12. || OUTSTANDING PAPER AWARD ||
  3. Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2018. Autoscaling Performance Measurement Tool. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). ACM, New York, NY, USA, 91-92. DOI:
  4. Vladimir Podolskiy, Yesika Ramirez, Atakan Yenel, Shumail Mohyuddin, Hakan Uyumaz, Ali Naci Uysal, Moawiah Assali, Sergei Drugalev, Matthias Friessnig, Anastasia Myasnichenko, and Michael Gerndt. Practical Education in IoT through Collaborative Work on Open-Source Projects with Industry and Entrepreneurial Organizations. 2018 IEEE Frontiers in Education Conference (FIE 2018).
  5. Vladimir Podolskiy, Anshul Jindal and Michael Gerndt. IaaS Reactive Autoscaling Performance Challenges. The 2018 IEEE International Conference on Cloud Computing (CLOUD 2018).
  6. Vladimir Podolskiy, Anshul Jindal, Michael Gerndt, and Yury Oleynik. Forecasting Models for Self-Adaptive Cloud Applications: A Comparative Study. 12th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO '18).

Relevant public code repositories:

Contact: Vladimir Podolskiy