Post
20
🚀 __Monostep v1__ is up →[Monostep-v1 Demo}( wop/Cosmos-T2-Chat)
A tiny (~16.6M) experimental model that predicts 4 tokens per forward pass instead of one. A Transformer trunk pools the prompt into a single vector, then 4 sequential "slot" heads emit a block of tokens left-to-right — a lightweight take on multi-token prediction.
Trained on GSM8K (GPT-2 tokenizer, 10 epochs). It's small and rough — answers are often wrong — but it's a fun little testbed for block decoding. Weights, config, training curves, and a self-contained inference snippet are all in the repo.
Also wired into the Cosmos T2-Accelerate chat demo, where it streams those 4-token blocks live. 🧪
#multitokenprediction #gsm8k #smallmodels
A tiny (~16.6M) experimental model that predicts 4 tokens per forward pass instead of one. A Transformer trunk pools the prompt into a single vector, then 4 sequential "slot" heads emit a block of tokens left-to-right — a lightweight take on multi-token prediction.
Trained on GSM8K (GPT-2 tokenizer, 10 epochs). It's small and rough — answers are often wrong — but it's a fun little testbed for block decoding. Weights, config, training curves, and a self-contained inference snippet are all in the repo.
Also wired into the Cosmos T2-Accelerate chat demo, where it streams those 4-token blocks live. 🧪
#multitokenprediction #gsm8k #smallmodels