IEEE MetroInd 2026: Characterizing GPU Capacity and Computational Cost in Production AI Inference

My paper on a replica-centric capacity-planning framework for GPU and Multi-Instance GPU (MIG) based AI inference platforms was accepted at IEEE MetroInd 4.0 & IoT 2026 in Rome, Italy.

Share
Characterizing GPU Capacity and Computational Cost in Production AI Inference, IEEE MetroInd 2026, Rome

My paper, "Characterizing GPU Capacity and Computational Cost in Production AI Inference Using a Replica-Centric Measurement Framework," was accepted at the 2026 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0 & IoT), held in Rome, Italy, June 10 to 12, 2026.

What the paper is about

The rapid adoption of deep learning and large language models in production has reshaped how large-scale platforms plan capacity. Unlike offline training, online inference must hold strict latency and availability guarantees under highly variable traffic, all while running on scarce and expensive GPUs. The paper presents a production-oriented capacity-planning framework for GPU and Multi-Instance GPU (MIG) based inference platforms, drawn from designing and operating a global inference platform that serves retrieval and ranking pipelines at massive scale.

The framework is replica-centric: it explicitly models planned disruption budgets, instance discriminators, GPU quotas, and system-level capacity buffers to balance fault tolerance, operational resilience, and utilization efficiency. It also confronts real-world constraints such as hardware-pool fragmentation, limited fault-domain diversity, and networking limits, and offers pragmatic corrections for small workloads and constrained pools. The result is predictable availability during routine operational events, together with materially better GPU packing efficiency in production inference systems.

Keywords

AI inference, GPU capacity planning, Multi-Instance GPU, cloud infrastructure, production ML systems, reliability engineering.

Citation

A. Gupta, "Characterizing GPU Capacity and Computational Cost in Production AI Inference Using a Replica-Centric Measurement Framework," in 2026 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0 & IoT), Rome, Italy, Jun. 2026.

Accepted for presentation; to appear in IEEE Xplore.