Papers
arxiv:2606.20027

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Published on Jun 18
· Submitted by
Luca Zedda
on Jun 24
Authors:
,
,
,
,
,
,

Abstract

QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains.

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL

Community

Paper submitter

Hi everyone, thanks for checking out our paper!

In QG-MIL, we study the attention concentration problem in medical Multiple Instance Learning, where attention-based aggregators can collapse onto a small subset of instances and produce unstable or overconfident predictions.

Our goal was to design a simple drop-in MIL aggregator that mitigates this behavior architecturally, without auxiliary losses, masking strategies, or multi-stage training. QG-MIL combines RMSNorm pre-normalization, per-head QK normalization, attention-output gating, and SwiGLU-style feed-forward layers, and we evaluate it across pathology and hematology benchmarks with different bag sizes and feature extractors.

Happy to discuss the method, limitations, ablations, or possible extensions!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.20027
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.20027 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.20027 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.20027 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.