FastWan-QAD-1.3B

Introduction

FastWan-QAD-1.3B is the fastest variant of the FastWan-QAD series, targeting RTX 5090 users. It uses NVFP4 quantized linear layers paired with the SageAttention3 FP4 attention backend, achieving end-to-end generation of a 5-second 480p video in 1.78 seconds — over 3.4× faster than prior distilled models on the same hardware.

The model is built on Wan-AI/Wan2.1-T2V-1.3B-Diffusers and trained with quantization-aware distillation (QAD), jointly optimizing for low-bit precision and 3-step inference quality.

Hardware requirement: RTX 5090 (sm100+). NVFP4 is a Blackwell-native format and is not supported on older GPUs. See FastWan-QAD-1.3B-SA2 for an alternative using SageAttention2++ or FastWan-QAD-FP8-1.3B for RTX 4090 support.

Model Overview

3-step inference via quantization-aware distillation
NVFP4 linear layers for maximum throughput on Blackwell GPUs
SageAttention3 FP4 backend for attention computation
Trained at 480p (832×480) resolution, 81 frames (5 seconds at 16 fps)
No classifier-free guidance at inference time
Fast decoding via TAEHV tiny autoencoder

Performance

Model	Hardware	Generation Time (5s 480p)
FastWan-QAD-1.3B	RTX 5090	1.78s
FastWan-QAD-1.3B-SA2	RTX 5090	~2.0s
FastWan-QAD-FP8-1.3B	RTX 4090	~3.4s
TurboDiffusion	RTX 5090	6.10s
LightX2V	RTX 5090	6.91s

Inference

docker run --gpus all --ipc=host --rm -it ghcr.io/hao-ai-lab/fastvideo/fastvideo-dev:py3.12-sha-f889e6b bash

# should drop you in /FastVideo with venv already activated
git fetch && git checkout main
# build fastvideo-kernel
cd fastvideo-kernels/ && ./build.sh && cd ..
git clone https://github.com/madebyollin/taehv
uv pip install ./taehv

# run generation:
FASTVIDEO_DISABLE_ATTENTION_COMPILE=0 FASTVIDEO_ATTENTION_BACKEND=ATTN_QAT_INFER python examples/inference/optimizations/FastWan_QAD_TAEHV.py --model FastVideo/FastWan-QAD-1.3B --distilled_model "" --taehv_checkpoint taehv/taew2_1.pth

Training

More details coming soon.

It would be greatly appreciated if you cite our paper:

@article{Zhang2026AttnQAT,
  title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training},
  author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao},
  journal={arXiv preprint arXiv:2603.00040},
  year={2026}
}

Downloads last month: 487

Paper for FastVideo/FastWan-QAD-1.3B

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

Paper • 2603.00040 • Published Mar 6