Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications
Publication Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale Kunal Jain, A. Parayil, Ankur Mallick, Rujia Wang, Renee St. Amant, Chetan Bansal, Victor Ruehle, Saravan Rajmohan, Shashwat Jaiswal, Yogesh Simmhan, Anoop Kulkarni, Steve Kofsky ACM Sigmetrics 2026 | June 2026 Project
Publication DroidSpeak: Efficient Context Sharing for Multiple-LLM Inference Yuhan Liu, Yuyang Huang, Jiayi Yao, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse NSDI | May 2026 Project
Publication Harvesting Spare CPU Resources in Container Systems Adam Hall, Anirudh Sarma, Esha Choukse, Kishore Ramachandran, Sameh Elnikety NSDI | May 2026
Publication Concord: Learning Network Configuration Contracts Ryan Beckett, Francis Y. Yan, Raghunadha Reddy Pocha, Vineesh V. Raj, Ayyub Shaik, Siva Kesava Reddy Kakarla 2026 European Conference on Computer Systems | April 2026
Publication Niyama : Breaking the Silos of LLM Inference Serving Kanishk Goel, Jayashree Mohan, Nipun Kwatra, Ravi Shreyas Anupindi, Ramachandran Ramjee Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2026 | March 2026 Project
Publication MeshAgent: Enabling Reliable Network Management with Large Language Models Yajie Zhou, Kevin Hsieh, Sathiya Kumaran Mani, Srikanth Kandula, Zaoxing Liu SIGMETRICS’26 | December 2025
Publication PnM: Efficient Intra-Datacenter Calls Packing for Large Conferencing Services Rohan Gandhi, Ankur Mallick ACM SoCC | November 2025
Publication ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving Haoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, Rodrigo Fonseca ACM Symposium on Cloud Computing (SoCC) 2025 | November 2025 Project Project
Publication Distributed AI Platform for the 6G RAN Ganesh Ananthanarayanan, Matthew Balkwill, Xenofon Foukas, Zhihua Lai, Bozidar Radunovic, Connor Settle, Yongguang Zhang ACM MobiCom Open-AI RAN Workshop | November 2025 Project
Publication From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models Xingqi Cui, Chieh-Jan Mike Liang, Jiarong Xing, Haoran Qiu November 2025 Project