arxiv:2607.01642

Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

Published on Jul 2

· Submitted by

Authors:

Abstract

MrFlow accelerates text-to-image diffusion by combining low-resolution generation with pixel-space super-resolution and noise injection, achieving up to 25x speedup without training or runtime modifications.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speedup without any training. However, the design of performing upsampling in the latent space, together with the selective modification of partial regions, causes these methods to exhibit noticeable blurring or artifacts. To this end, we propose MrFlow, a training-free multi-resolution acceleration strategy for pretrained flow-matching models built upon a staged low-to-high-resolution pipeline. MrFlow first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model, subsequently injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. Quantitative and qualitative results on FLUX.1-dev and Qwen-Image show that MrFlow exploits the quadratic token reduction and reduced step requirement of low-resolution sampling to achieve 10x end-to-end acceleration while keeping OneIG within a 1% gap relative to that before acceleration, significantly surpassing other training-free acceleration strategies, and requiring no training or runtime dynamic identification whatsoever. MrFlow can further be directly combined orthogonally with pre-trained timestep distillation strategies, achieving even higher generation acceleration of up to 25x.

View arXiv page View PDF GitHub 3 Add to collection

Community

Xingyu-Zheng

Paper submitter about 5 hours ago

•

edited about 5 hours ago

MrFlow proposes a training-free multi-resolution strategy for accelerating image generation, following a clear coarse-to-fine pipeline: multi-step low-resolution structure sampling, pixel-space super-resolution, and one-step high-resolution detail refinement. This elegant design achieves faithful generation with up to 10x end-to-end speedup, establishing a new SOTA among training-free diffusion acceleration methods. Moreover, MrFlow is orthogonal to pretrained timestep distillation methods, allowing straightforward combination and further pushing the end-to-end speedup beyond 25x. Overall, the work is simple but effective.