arxiv:2606.20027

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Published on Jun 18

· Submitted by

Authors:

Abstract

QG-MIL introduces a gated transformer aggregator for multiple instance learning in medical imaging that stabilizes attention distribution and improves prediction consistency across different medical domains.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL

View arXiv page View PDF GitHub 1 Add to collection

Community

Snarcy

Paper submitter 1 day ago

Hi everyone, thanks for checking out our paper!

In QG-MIL, we study the attention concentration problem in medical Multiple Instance Learning, where attention-based aggregators can collapse onto a small subset of instances and produce unstable or overconfident predictions.

Our goal was to design a simple drop-in MIL aggregator that mitigates this behavior architecturally, without auxiliary losses, masking strategies, or multi-stage training. QG-MIL combines RMSNorm pre-normalization, per-head QK normalization, attention-output gating, and SwiGLU-style feed-forward layers, and we evaluate it across pathology and hematology benchmarks with different bag sizes and feature extractors.

Happy to discuss the method, limitations, ablations, or possible extensions!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.20027

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.20027 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.20027 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.20027 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.