view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand qgallouedec • Dec 4, 2025 • 72
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 Text Generation • 335B • Updated 7 days ago • 232k • • 190
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 5 days ago • 158
Running 186 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 186 Building and scaling RL environments for LLM training
NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 24 days ago • 35
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 22 days ago • 141
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published May 9 • 23