Papers
arxiv:2607.01642

Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

Published on Jul 2
· Submitted by
Xingyu Zheng
on Jul 3
Authors:
,
,
,
,
,
,

Abstract

MrFlow accelerates text-to-image diffusion by combining low-resolution generation with pixel-space super-resolution and noise injection, achieving up to 25x speedup without training or runtime modifications.

Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speedup without any training. However, the design of performing upsampling in the latent space, together with the selective modification of partial regions, causes these methods to exhibit noticeable blurring or artifacts. To this end, we propose MrFlow, a training-free multi-resolution acceleration strategy for pretrained flow-matching models built upon a staged low-to-high-resolution pipeline. MrFlow first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model, subsequently injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. Quantitative and qualitative results on FLUX.1-dev and Qwen-Image show that MrFlow exploits the quadratic token reduction and reduced step requirement of low-resolution sampling to achieve 10x end-to-end acceleration while keeping OneIG within a 1% gap relative to that before acceleration, significantly surpassing other training-free acceleration strategies, and requiring no training or runtime dynamic identification whatsoever. MrFlow can further be directly combined orthogonally with pre-trained timestep distillation strategies, achieving even higher generation acceleration of up to 25x.

Community

MrFlow proposes a training-free multi-resolution strategy for accelerating image generation, following a clear coarse-to-fine pipeline: multi-step low-resolution structure sampling, pixel-space super-resolution, and one-step high-resolution detail refinement. This elegant design achieves faithful generation with up to 10x end-to-end speedup, establishing a new SOTA among training-free diffusion acceleration methods. Moreover, MrFlow is orthogonal to pretrained timestep distillation methods, allowing straightforward combination and further pushing the end-to-end speedup beyond 25x. Overall, the work is simple but effective.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2607.01642
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2607.01642 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2607.01642 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.