Attn-QAT: Making 4-Bit Attention Actually WorkApril 8, 2026Peiyuan Zhang*, Matthew Noto*, Wenxuan Tan*, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang
From Physical Commonsense to Scientific Reasoning: Why World Modeling in Video MattersFebruary 12, 2026Lanxiang Hu, Abhilash Shankarampeta, Yixin Huang, Zilin Dai, Haoyang Yu, Yujie Zhao, Haoqiang Kang, Daniel Zhao, Tajana Rosing, Hao Zhang
CAD: Disaggregating Core Attention for Efficient Long-context Language Model TrainingDecember 17, 2025Yonghao Zhuang*, Junda Chen*, Bo Pang, Yi Gu, Yibo Zhu, Yimin Jiang, Ion Stoica, Eric Xing, Hao Zhang
Fast and Accurate Causal Parallel Decoding using Jacobi ForcingDecember 16, 2025Lanxiang Hu*, Siqi Kou*, Yichao Fu, Samyam Rajbhandari, Tajana Rosing, Yuxiong He, Zhijie Deng, Hao Zhang
AUP: when Accuracy Meets Parallelism in Diffusion Language ModelsDecember 10, 2025Yu-Yang Qian, Junda Su, Lanxiang Hu, Peiyuan Zhang, Zhijie Deng, Peng Zhao, Hao Zhang
Scaling Speculative Decoding with Lookahead ReasoningSeptember 22, 2025Yichao Fu, Yiming Zhao, Rui Ge, Hao Zhang