17 17 8

khtsly

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 hour ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

upvoted a paper 1 day ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

upvoted a paper 2 days ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

View all activity

Organizations

None yet

upvoted a paper about 1 hour ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper • 2606.12397 • Published 1 day ago • 73

upvoted a paper 1 day ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 3 days ago • 13

upvoted a paper 2 days ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Paper • 2606.09079 • Published 4 days ago • 56

New activity in sapientinc/HRM-Text-1B 5 days ago

Hrm can't calculate 2+2

#8 opened 7 days ago by

Xhub1880

commented a paper 7 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516 •

upvoted 2 papers 7 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Paper • 2605.29707 • Published 15 days ago • 145

upvoted a paper 8 days ago

dMoE: dLLMs with Learnable Block Experts

Paper • 2605.30876 • Published 14 days ago • 36

upvoted a paper 9 days ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published 19 days ago • 35

upvoted a paper 17 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published 21 days ago • 13

upvoted a paper 19 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published about 1 month ago • 195

upvoted 2 papers 20 days ago

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published 23 days ago • 313

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published 22 days ago • 31

upvoted a paper 21 days ago

Generative Recursive Reasoning

Paper • 2605.19376 • Published 23 days ago • 29

liked a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 109 • 2

published a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 109 • 2

updated a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 109 • 2

updated a dataset about 2 months ago

khtsly/roblox_docs_corpus_text

Viewer • Updated Apr 23 • 1.55k • 20 • 1

New activity in Jackrong/Qwopus-GLM-18B-Merged-GGUF about 2 months ago

merging problem

👀 1

#5 opened about 2 months ago by

khtsly

New activity in google/gemma-4-31B-it about 2 months ago

Can anyone improve the model using the Rys methodology—by duplicating a block of layers?

#60 opened 2 months ago by

Regrin

khtsly

AI & ML interests

Recent Activity

Organizations

khtsly's activity

Hrm can't calculate 2+2

merging problem

Can anyone improve the model using the Rys methodology—by duplicating a block of layers?