Gongfan Fang

Ph.D. Candidate | xML Lab | National University of Singapore.

me.jpeg

Hi there! I'm Gongfan Fang, a final-year Ph.D. candidate at the xML Lab, National University of Singapore, supervised by Prof. Xinchao Wang (Presidential Young Professor). I received my B.Eng. (2019) and M.Eng. (2022) from the VIPA Lab, Zhejiang University, advised by Prof. Mingli Song.

I'm currently working on Efficient Large Language Models and GenAI, with emphasis on LLM Reasoning, Model Efficiency, and Diffusion Language Models. I'm also the creator and lead developer of Torch-Pruning, a top framework for accelerating foundation models, which has been intergated into many industrial products like NVIDIA TAO (See the ACK). During my PhD, I previously worked with the amazing DLER team at NVIDIA. And I was awarded the 2024 ByteDance Scholarship (10~15 recipients per year).

I'm currently on the job market and open to both academic and industrial opportunities starting in March 2026. Please feel free to reach out via email (gongfan at u.nus.edu) if there’s a potential fit.



News

Sep, 2025 🍺 Three papers Thinkless, dKV-Cache and VeriThinker were accepted by NeurIPS’25.
Feb, 2025 🌟 One first-author paper TinyFusion (Highlight) and three co-authored papers were accepted by CVPR’25..
Dec, 2024 🎵 I’m deeply honored to be awarded the 2024 ByteDance Scholarship (10~15 recipients per year).
Sep, 2024 🚀 Two first-author papers MaskLLM (Spotlight) and Remix-DiT were accepted by NeurIPS’24.

Selected Publications


Full Paper List | Citation: 3003
  1. fang2025thinkless.png
    NeurIPS’25
    Thinkless: LLM Learns When to Think
    Gongfan FangXinyin Ma, and Xinchao Wang
    Advances in Neural Information Processing Systems, 2025
    National University of Singapore
    Auto Switch between Long-Short Reasoning via Decoupled GRPO | Cuts 50%-90% of Unnecessary Thinking | Stop Overthinking 1+1=?
  2. fang2024maskllm.png
    NeurIPS’24
    MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
    Advances in Neural Information Processing Systems, 2024
    NVIDIA Research, National University of Singapore
    NeurIPS’24 Spotlight (2%) | Post-training of Sparse LLMs | The First Scalable Algorithm for N:M Sparsity in LLMs | 1.5x Faster with 30%+ Memory Saving
  3. fang2023depgraph.png
    CVPR’23
    DepGraph: Towards Any Structural Pruning
    Gongfan FangXinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
    National University of Singapore, Zhejiang University, Huawei
    500+ Citations, 3000+ Stars, 300,000+ Downloads | Github #Model-Compression Top-5 | Pruning of Foundation Models | Integrated in NVIDIA TAO
  4. ma2023deepcache.png
    CVPR’24
    DeepCache: Accelerating Diffusion Models for Free
    Xinyin MaGongfan Fang, and Xinchao Wang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
    National University of Singapore
    Training-free and almost lossless | 2-7x Speedup on Diffusion Models

Education

2022.07 - 2026.06 - Ph.D. in Electrical and Computer Engineering, National University of Singapore.

2019.09 - 2022.04 - M.Eng. in Computer Science, College of Computer Science and Technology, Zhejiang University.

2015.09 - 2019.06 - B.S. in Computer Science, College of Computer Science and Technology, Zhejiang University.