Abstract
As software systems evolve, monolithic applications are often transformed into a collection of lightweight and loosely-coupled microservices. Understanding the intricacies of microservices is crucial for optimizing their deployment. Unfortunately, existing research and literature often fail to provide a comprehensive analysis of microservice performance and resource utilization, particularly when it comes to addressing performance imbalances across microservice containers within large-scale production clusters.This paper aims to address a critical research gap by conducting an extensive analysis of performance imbalances within Alibaba’s clusters, utilizing response time as a key indicator. Additionally, we investigate the issue of utilization imbalance. Our study reveals that a substantial number of microservices suffer from performance imbalances, primarily attributed to disparities in node resource utilization and uneven communication overhead resulting from inter-node requests. Furthermore, we identify that offline jobs significantly contribute to the imbalanced utilization of node resources. Based on our findings, we propose a novel joint optimization designed to minimize inter-node requests. By adopting this optimization, we aim to enhance the end-to-end performance of the system.
Materials
Full text available here.
Cite this work
@INPROCEEDINGS{10491818, author={Lin, Chenyu and Luo, Shutian and Xu, Huanle}, booktitle={2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)}, title={Exploring Imbalances among Microservice Containers in Large Cloud Platforms}, year={2023}, volume={}, number={}, pages={237-245}, keywords={Costs;Microservice architectures;Production;Containers;Software systems;Mathematical models;Resource management}, doi={10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00064} }