Hi there! I'm Gongfan Fang, a final-year Ph.D. candidate at the xML Lab, National University of Singapore, supervised by Prof. Xinchao Wang (Presidential Young Professor). I received my B.Eng. (2019) and M.Eng. (2022) from the VIPA Lab, Zhejiang University, advised by Prof. Mingli Song.

I'm currently working on Efficient Large Language Models and GenAI, with emphasis on LLM Reasoning, Model Efficiency, and Diffusion Language Models. I'm also the creator and lead developer of Torch-Pruning, a top framework for accelerating foundation models, which has been intergated into many industrial products like NVIDIA TAO (See the ACK). During my PhD, I previously worked with the amazing DLER team at NVIDIA. And I was awarded the 2024 ByteDance Scholarship (10~15 recipients per year).

I'm currently on the job market and open to both academic and industrial opportunities starting in March 2026. Please feel free to reach out via email (gongfan at u.nus.edu) if there’s a potential fit.

News

Sep, 2025	🍺 Three papers Thinkless, dKV-Cache and VeriThinker were accepted by NeurIPS’25.
Feb, 2025	🌟 One first-author paper TinyFusion (Highlight) and three co-authored papers were accepted by CVPR’25..
Dec, 2024	🎵 I’m deeply honored to be awarded the 2024 ByteDance Scholarship (10~15 recipients per year).
Sep, 2024	🚀 Two first-author papers MaskLLM (Spotlight) and Remix-DiT were accepted by NeurIPS’24.

Selected Publications

Full Paper List | Citation: 3003

NeurIPS’25
Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, and Xinchao Wang

Advances in Neural Information Processing Systems, 2025

National University of Singapore

Auto Switch between Long-Short Reasoning via Decoupled GRPO | Cuts 50%-90% of Unnecessary Thinking | Stop Overthinking 1+1=?

arXiv Bib Code
@article{fang2025thinkless, title = {Thinkless: LLM Learns When to Think}, author = {Fang, Gongfan and Ma, Xinyin and Wang, Xinchao}, journal = {Advances in Neural Information Processing Systems}, year = {2025}, }
NeurIPS’24
MaskLLM: Learnable Semi-structured Sparsity for Large Language Models

Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, and Xinchao Wang

Advances in Neural Information Processing Systems, 2024

NVIDIA Research, National University of Singapore

NeurIPS’24 Spotlight (2%) | Post-training of Sparse LLMs | The First Scalable Algorithm for N:M Sparsity in LLMs | 1.5x Faster with 30%+ Memory Saving

arXiv Webpage Bib Code
@article{fang2024maskllm, title = {MaskLLM: Learnable Semi-structured Sparsity for Large Language Models}, author = {Fang, Gongfan and Yin, Hongxu and Muralidharan, Saurav and Heinrich, Greg and Pool, Jeff and Kautz, Jan and Molchanov, Pavlo and Wang, Xinchao}, journal = {Advances in Neural Information Processing Systems}, year = {2024}, }
CVPR’23
DepGraph: Towards Any Structural Pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

National University of Singapore, Zhejiang University, Huawei

500+ Citations, 3000+ Stars, 300,000+ Downloads | Github #Model-Compression Top-5 | Pruning of Foundation Models | Integrated in NVIDIA TAO

arXiv Bib PDF Code
@inproceedings{fang2023depgraph, title = {DepGraph: Towards Any Structural Pruning}, author = {Fang, Gongfan and Ma, Xinyin and Song, Mingli and Mi, Michael Bi and Wang, Xinchao}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {16091--16101}, year = {2023}, }
CVPR’24
DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, and Xinchao Wang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

National University of Singapore

Training-free and almost lossless | 2-7x Speedup on Diffusion Models

arXiv Bib Code
@inproceedings{ma2023deepcache, title = {DeepCache: Accelerating Diffusion Models for Free}, author = {Ma, Xinyin and Fang, Gongfan and Wang, Xinchao}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2024}, }