FAISys 2025

Regular Paper Session 1: Inference Optimization 1:40PM 
Keynote 9:00AM
Welcome Address 8:50AM
Grand Challenges 10:15AM
Lunch 12:00PM (The Stage, 3/F)

Session chair: Chuan Wu (HKU)
Break 10:00AM (1/F, near LT-1)
Break 3:20PM (1/F, near LT-1)
Selected arXiv Talks 1: Cloud & Mobile 3:50PM
Recess 5:10PM
Bus to Banquet 5:30PM
Banquet 6:00PM (Lei Garden, Shatin)
Friday, November 14
(subject to change)
● TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving Feng Ren, Tsinghua University; Ruoyu Qin, Tsinghua University & Moonshot AI; Teng Ma, Shangming Cai, Zheng Liu, Alibaba Group; Chao Lei, Dejiang Zhu, Ant Group; Ke Yang, Jinyang Su, Approaching AI; Weixiao Huang, Yikai Zhao, Moonshot AI; Yongwei Wu, Weimin Zheng, Mingxing Zhang, Tsinghua University
BeLLMan: Controlling LLM Congestion Tella Rajashekhar Reddy, Atharva Deshmukh, Karan Tandon, Rohan Gandhi, Anjaly Parayil, Debopam Bhattacherjee, Microsoft
EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration Jiangsu Du, Hongbin Zhang, Taosheng Wei, Zhenyi Zheng, Kaiyi Wu, Zhiguang Chen, Yutong Lu, Sun Yat-sen University
● Efficient RL for LLMs with Dynamic and Online Speculative Decoding Chao Jin, Yinmin Zhong, Zili Zhang, Peking University; Yimin Jiang, Anuttacon; Yibo Zhu, Stepfun
● Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately Yuhang Wang and Youhe Jiang, SJTU; Bin Cui, Peking University; Fangcheng Fu, SJTU

Session chair: Zeke Wang (Zhejiang University)
Break 10:20AM (Space near 204)
Concluding Remarks 3:00PM
Lunch 12:00PM (The Stage, 3/F)
Selective arXiv Talks 3: Inference across the Whole Stack 1:40PM
Regular Paper Session 2: RL & LLM Training 10:40AM
Selective arXiv Talks 2: RL 9:00AM
Saturday, November 15
(subject to change)
RFabric: A Reconfigurable Network for the Rhythms of Disaggregated RL
Xin Tan and Yicheng Feng, The Chinese University of Hong Kong; Yu Zhou, Yimin Jiang, and Yibo Zhu, StepFun; Hong Xu, The Chinese University of Hong Kong
GovBench: From Natural Language to Executable Pipelines, A Benchmark for Data Governance Automation Zhou Liu, Peking University; Zhaoyang Han, Huazhong University of Science and Technology; Guochen Yan, Hao Liang, Wentao Zhang, and Bin Cui, Peking University
Accelerating Generation in RLHF by Phase-Aware Tensor Parallelism
Long Zhao and Qinghe Wang, Anhui University; Jiaan Zhu, Youhui Bai, and Zewen Jin, University of Science and Technology of China; Chaoyi Ruan, National University of Singapore; Shengnan Wang, ZheJiang University; Cheng Li, University of Science and Technology of China
Galvatron-2: An Automatic Distributed System for Efficient Foundation Model Training
Xinyi Liu, Yujie Wang, and Shenhan Zhu, Peking University; Fangcheng Fu, Shanghai Jiao Tong University; Qingshuo Liu, Guangming Lin, Ziyi Guo, and Bin Cui, Peking University
Session chair: Yimin Jiang (Anuttacon)
Session chair: Wei Wang (HKUST)
Session chair: Xiaosong Ma (MBZUAI)
Prof. Jidong Zhai, Tsinghua University
Jidong Zhai is a Professor in the Department of Computer Science and Technology of Tsinghua University. He was a Visiting Professor of Stanford University from 2015 to 2016 and a Visiting Scholar of MSRA (Microsoft Research Asia) in 2013. His current research interests include parallel computing, compiler, and performance evaluation. He has published more than 100 papers in prestigious conferences (such as SC, ICS, PPOPP, ASPLOS, MICRO, OSDI, ATC, and PACT) and top-tier journals (such as IEEE TC and IEEE TPDS). His research received 2021 Best Paper Award for IEEE TPDS, Best Paper Award at CLUSTER'21, Best Student Paper Award at ICS’21, Best Paper Honorable Mention Award at ICDCS’20, and Best Paper Finalist at SC’14. He has served as PC members for a number of international conferences including SC, ICS, PPoPP, IPDPS, ICPP, Cluster, and PACT. He was a program co-chair of NPC 2018. He is currently on the editorial boards of IEEE Transactions on Computers (TC), IEEE Transactions on Cloud Computing (TCC), Journal of Parallel and Distributed Computing (JPDC), and Journal of Parallel Computing (PARCO).
Prof. Minchen Yu, CUHK (SZ)
Minchen Yu is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. He received his Ph.D. from Hong Kong University of Science and Technology and B.Eng. from Nanjing University. His research interests cover cloud computing and distributed systems, with the recent focus on serverless computing and machine learning systems. His work has been published at various prestigious venues, including NSDI, ATC, EuroSys, INFOCOM, and SoCC. He received the Best Paper Runner-Up Award at IEEE ICDCS 2021.
Chenyu Guo, Ant Group
#
Bojie Li, Pine AI
Bojie Li is the Co‑Founder and Chief Scientist at Pine AI. He received his Ph.D. in Computer Science in 2019 from the University of Science and Technology of China (USTC) in collaboration with Microsoft Research Asia (MSRA). His research has appeared at top venues including SIGCOMM, SOSP, NSDI, ATC, and PLDI. He is a recipient of the ACM China Doctoral Dissertation Award and the Microsoft Research Asia Ph.D. Fellowship.
Prof. Jidong Zhai, Tsinghua University
Jidong Zhai is a Professor in the Department of Computer Science and Technology of Tsinghua University. He was a Visiting Professor of Stanford University from 2015 to 2016 and a Visiting Scholar of MSRA (Microsoft Research Asia) in 2013. His current research interests include parallel computing, compiler, and performance evaluation. He has published more than 100 papers in prestigious conferences (such as SC, ICS, PPOPP, ASPLOS, MICRO, OSDI, ATC, and PACT) and top-tier journals (such as IEEE TC and IEEE TPDS). His research received 2021 Best Paper Award for IEEE TPDS, Best Paper Award at CLUSTER'21, Best Student Paper Award at ICS’21, Best Paper Honorable Mention Award at ICDCS’20, and Best Paper Finalist at SC’14. He has served as PC members for a number of international conferences including SC, ICS, PPoPP, IPDPS, ICPP, Cluster, and PACT. He was a program co-chair of NPC 2018. He is currently on the editorial boards of IEEE Transactions on Computers (TC), IEEE Transactions on Cloud Computing (TCC), Journal of Parallel and Distributed Computing (JPDC), and Journal of Parallel Computing (PARCO).
Prof. Minchen Yu, CUHK (SZ)
Minchen Yu is an Assistant Professor at the School of Data Science, The Chinese University of Hong Kong, Shenzhen. He received his Ph.D. from Hong Kong University of Science and Technology and B.Eng. from Nanjing University. His research interests cover cloud computing and distributed systems, with the recent focus on serverless computing and machine learning systems. His work has been published at various prestigious venues, including NSDI, ATC, EuroSys, INFOCOM, and SoCC. He received the Best Paper Runner-Up Award at IEEE ICDCS 2021.
Zhenyu Guo, Ant Group
Zhenyu Guo is the director of Data and AI Infrastructure at Ant Security. His recent work focuses on building novel systems and algorithms for training, simulation, and serving in risk-related scenarios. Previously, he was the leader of the Distributed Computing Group at Ant Group and a researcher at Microsoft Research Asia. His research interest spans compilers, distributed systems, big data, and large models, with an emphasis on co-designing these components to build accessible, controllable, adaptable, and trustworthy systems. He has authored 20+ papers in top-tier conferences, such as OSDI, NSDI, SIGMOD, and HotOS.
Bojie Li, Pine AI
Bojie Li is the Co‑Founder and Chief Scientist at Pine AI. He received his Ph.D. in Computer Science in 2019 from the University of Science and Technology of China (USTC) in collaboration with Microsoft Research Asia (MSRA). His research has appeared at top venues including SIGCOMM, SOSP, NSDI, ATC, and PLDI. He is a recipient of the ACM China Doctoral Dissertation Award and the Microsoft Research Asia Ph.D. Fellowship.

Keynote & Challenge Info
Keynote speaker: Prof. Ji-Rong Wen, Renmin University of China
Ji-Rong Wen is a Professor and Founding Dean of the Gaoling School of Artificial Intelligence at Renmin University of China. He previously served as a Senior Researcher and Manager of the Web Search and Mining Group at Microsoft Research Asia. Over his career, he has published 500+ papers at leading conferences and journals, with 50,000+ citations and an H-index of 100+. In recent years, he has focused on large-scale foundation models, leading teams that developed “Wenlan,” the first Chinese multimodal large model; the “Yulan” series of large language models; and LLaDA, the first open-source large language diffusion model.
Keynote: "LLaDA: A New Paradigm for Large Language Diffusion Models"
Abstract: This talk focuses on a single question: Is autoregression the only path to SOTA, and beyond, generative AI? I will first survey the evolution of generative models through the lens of unified probabilistic modeling and argue that core LLM properties - scalability, instruction following, in-context learning, multi-turn conversations, and lossless compression - stem primarily from the generative objective rather than from autoregression per se. Building on this perspective, I will introduce our LLaDA series of diffusion-based large language models, covering core theory, scaling laws, large-scale training, SFT, alignment, MoE, and multimodal modeling. Using a non-autoregressive approach, LLaDA exhibits surprising scalability and strong multi-turn conversational abilities. Together, our results show that diffusion is a highly promising path for large language modeling and challenge the prevailing assumption that core LLM capabilities inherently depend on autoregression.
© FAISys 2025 All Rights Reserved