News

On November 26, 2025 at 10:00 AM, Parimal Parag from IISc will be giving a talk in the Seminar room N2407 and via Zoom about "Inference optimization for LLM serving systems".

Inference optimization for LLM serving systems

Parimal Parag
Indian Institute of Science at Bangalore

Abstract:

Large language models (LLMs) have led to ground-breaking improvements in the capabilities of generative AI (Gen-AI) applications, leading to their increased adoption, which in turn is leading to increasing volumes of user requests at LLM inference deployments. The existing common implementations of LLM inference engines perform a new prefill every time there is a prompt departure. We analytically model the inference system for a fixed batch size with large rate of prompt arrivals and scheduling prefills after a fixed number of prompt departures. We characterize the throughput of the system as number of prompts departing per unit time for different thresholds. We observe that to maximize throughput, there exists an optimal threshold on the number of prompt departures. We verify this observation with vLLM experiments, and compare the optimal threshold predicted theoretically to the experimentally observed ones.

Biography:

Parimal Parag is currently an associate professor in department of electrical communication engineering at Indian Institute of Science at Bangalore. He was working as senior systems engineer in R&D at ASSIA Inc. from October 2011 to November 2014. He received his B. Tech. and M. Tech. degrees from Indian Institute of Technology Madras in fall 2004; and the PhD degree from Texas A&M University in fall 2011. He was at Stanford University and Los Alamos National Laboratory, in autumn of 2010 and summer of 2007, respectively.

His research interests are in design, performance evaluation, and control of large distributed and networked intelligent systems applying mathematical tools from queueing theory, information theory, coding theory, and optimization methods.

Zoom Link: https://tum-conf.zoom-x.de/j/66219422619?pwd=UGJdNUj67Px3mWmueTrxbDb3aiAfd0.1
Meeting ID: 662 1942 2619
Passcode: 545195

◄ Back to: News and Talks

To top

Institute for Communications Engineering

Prof. Dr. Gerhard Kramer
Prof. Dr. Norbert Hanik
Prof. Dr. Antonia Wachter-Zeh

Technical University of Munich
Theresienstrasse 90
80333 München

office(at)ice.cit.tum.de

Tel: +49 89 289-23466
Fax: +49 89 289-23490

News

Talk: Parimal Parag (November 26, 2025 at 10:00 AM, Seminar room N2407, Zoom)

Inference optimization for LLM serving systems

Abstract:

Biography: