QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging
Abstract
QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains.
Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL
Community
Hi everyone, thanks for checking out our paper!
In QG-MIL, we study the attention concentration problem in medical Multiple Instance Learning, where attention-based aggregators can collapse onto a small subset of instances and produce unstable or overconfident predictions.
Our goal was to design a simple drop-in MIL aggregator that mitigates this behavior architecturally, without auxiliary losses, masking strategies, or multi-stage training. QG-MIL combines RMSNorm pre-normalization, per-head QK normalization, attention-output gating, and SwiGLU-style feed-forward layers, and we evaluate it across pathology and hematology benchmarks with different bag sizes and feature extractors.
Happy to discuss the method, limitations, ablations, or possible extensions!
Get this paper in your agent:
hf papers read 2606.20027 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper