Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
sergiopaniegoย
posted an update 2 days ago
Post
6048
new banger blog alert ๐จ
@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped
takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start
covers
find it here: https://huggingface.co/blog/torch-profiler
@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped
takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start
covers
torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hoodfind it here: https://huggingface.co/blog/torch-profiler
hypotheticalย
posted an update about 23 hours ago
Post
973
The smallest and the highest quality in the world Gemma4 E2B and E4B models! 7x compression! From 9.3GB -> 1.4GB!
TheStageAI/gemma-4-E2B-it
TheStageAI/gemma-4-E4B-it
TheStageAI/gemma-4-E2B-it
TheStageAI/gemma-4-E4B-it
Post
953
Hugging Face MCP Server v0.3.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SEP-2640 "Skills Over MCP" support added (early access)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SEP-2640 "Skills Over MCP" support added (early access)
Post
670
New blog post!
An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.
We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐
Key takeaways from our experiments:
1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU.
2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that).
3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.
And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!
Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search
An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.
We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐
Key takeaways from our experiments:
1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU.
2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that).
3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.
And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!
Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search
RakshitAralimattiย
posted an update 2 days ago
Post
423
Reading engineering and research blogs from OpenAI, Anthropic, DeepMind, Meta and others has genuinely leveled up my understanding of AI systems and helped me in my day-to-day work. But keeping track of 20+ sites manually is a pain.
So I built AI Blogs Tracker โ a Streamlit app that scrapes the actual blog listing pages (not search) of 20+ top AI companies and surfaces titles, dates, and links in one clean feed. Filter by source, by date, star posts to a reading list, or add your own custom sources.
One click. ~30 seconds. Everything in one place.
๐ GitHub link - https://github.com/rakshit2020/Tech-Blogs-Tracker-of-Top-AI-Companies-Agent
So I built AI Blogs Tracker โ a Streamlit app that scrapes the actual blog listing pages (not search) of 20+ top AI companies and surfaces titles, dates, and links in one clean feed. Filter by source, by date, star posts to a reading list, or add your own custom sources.
One click. ~30 seconds. Everything in one place.
๐ GitHub link - https://github.com/rakshit2020/Tech-Blogs-Tracker-of-Top-AI-Companies-Agent
RiverRiderย
posted an update 1 day ago
Post
927
This is not the end of words. It is the end of pretending their meanings are determined.
Meaning Forks. SRT detects it.
Paste any text to identify contested terms
RiverRider/srt-introspect
Try any prompt (attached link) to see exactly what an LLM is thinking at every meaningful step of its answer
RiverRider/srt-introspect
Repository
https://github.com/space-bacon/SRT
Paper
https://github.com/space-bacon/SRT/blob/main/paper_nla.md
Explainer
https://github.com/space-bacon/SRT/blob/main/docs/EXPLAINERS.md
Meaning Forks. SRT detects it.
Paste any text to identify contested terms
RiverRider/srt-introspect
Try any prompt (attached link) to see exactly what an LLM is thinking at every meaningful step of its answer
RiverRider/srt-introspect
Repository
https://github.com/space-bacon/SRT
Paper
https://github.com/space-bacon/SRT/blob/main/paper_nla.md
Explainer
https://github.com/space-bacon/SRT/blob/main/docs/EXPLAINERS.md
Post
961
Qwen Image Edit 2511 Fast + LoRA โก
ovi054/Qwen-Image-Edit-2511-LoRA
QIE-2511 is an image editing model with integrated LoRA capabilities. You can add any custom LoRA to generate and edit images within this Space.
๐ Try it now: ovi054/Qwen-Image-Edit-2511-LoRA
ovi054/Qwen-Image-Edit-2511-LoRA
QIE-2511 is an image editing model with integrated LoRA capabilities. You can add any custom LoRA to generate and edit images within this Space.
๐ Try it now: ovi054/Qwen-Image-Edit-2511-LoRA
kanaria007ย
posted an update 1 day ago
Post
100
โ
Article highlight: *Deployment & Rollback Governance for Learning Worlds* (art-60-169, v0.1)
TL;DR:
This article argues that deployment is the highest-risk moment in a learning world.
Training produces a new policy. Deployment turns that policy into an institution inside the world. So rollout cannot be treated like a casual model swap. It needs deploy-gate contracts, canaries, phased rollout, kill-switches, rollback receipts, and explicit non-interference rules that stop โbetter learningโ from silently rewriting world reality.
Read:
kanaria007/agi-structural-intelligence-protocols
Why it matters:
โข treats deployment as governed change, not routine ops
โข prevents silent reality drift when a newly trained policy changes world outcomes
โข binds rollout to safety envelopes, evaluation validity, performance SLOs, and canon boundaries
โข makes rollback and emergency stop part of the formal operating contract
Whatโs inside:
โข a *model deploy gate contract* that defines when a learned policy may enter the world
โข canary and phased rollout as explicit governed stages
โข kill-switch and rollback receipts for emergency containment
โข non-interference audits so training and deployment do not rewrite canon or governance outcomes
โข appeal and publication boundaries for claims like โwe deployed safelyโ or โwe rolled back successfullyโ
Key idea:
Do not say:
*โwe trained a better model, so we deployed it.โ*
Say:
*โthis policy entered the world under this deploy gate, this rollout stage, these envelope and SLO checks, these rollback guarantees, and these receipts.โ*
That is how deployment becomes governance with receipts.
TL;DR:
This article argues that deployment is the highest-risk moment in a learning world.
Training produces a new policy. Deployment turns that policy into an institution inside the world. So rollout cannot be treated like a casual model swap. It needs deploy-gate contracts, canaries, phased rollout, kill-switches, rollback receipts, and explicit non-interference rules that stop โbetter learningโ from silently rewriting world reality.
Read:
kanaria007/agi-structural-intelligence-protocols
Why it matters:
โข treats deployment as governed change, not routine ops
โข prevents silent reality drift when a newly trained policy changes world outcomes
โข binds rollout to safety envelopes, evaluation validity, performance SLOs, and canon boundaries
โข makes rollback and emergency stop part of the formal operating contract
Whatโs inside:
โข a *model deploy gate contract* that defines when a learned policy may enter the world
โข canary and phased rollout as explicit governed stages
โข kill-switch and rollback receipts for emergency containment
โข non-interference audits so training and deployment do not rewrite canon or governance outcomes
โข appeal and publication boundaries for claims like โwe deployed safelyโ or โwe rolled back successfullyโ
Key idea:
Do not say:
*โwe trained a better model, so we deployed it.โ*
Say:
*โthis policy entered the world under this deploy gate, this rollout stage, these envelope and SLO checks, these rollback guarantees, and these receipts.โ*
That is how deployment becomes governance with receipts.
sergiopaniegoย
posted an update 1 day ago
Post
924
most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training
@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above
go read it ๐ค
https://qgallouedec-tito.hf.space/
@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above
go read it ๐ค
https://qgallouedec-tito.hf.space/