TinyStories 2M (tinygpt1m) GGUF & HF Validation Suite with GPT-2 Style Byte-Level BPE

This repository provides an ultra-lightweight Llama-style validation model with a ~1M parameter footprint, trained on the TinyStories dataset and optimized for testing and validation. It is specifically designed and trained from scratch to aggressively stress-test custom inference engines against edge-case boundary conditions, memory alignment non-uniformity, and sparse vocabulary handling.

The network architecture deliberately avoids common "power-of-two" or "multiples of 32" configurations to thoroughly expose hidden assumptions, hardcoded boundaries, or improper optimization loops within the codebase.


πŸ“‚ Repository Structure

.:
hf/                  # Hugging Face / Safetensors native weight formats
└── config.json
└── generation_config.json
└── model.safetensors
└── tokenizer.json
└── tokenizer_config.json
└── special_tokens_map.json
tinygpt1m.F32.gguf   # llama.cpp / Custom engine GGUF binary format (F32)

πŸ“ Model Specifications

To rigorously validate the numerical and algorithmic structural integrity of inference engines, the model enforces the following non-standard and asymmetric topology:

  • Architecture: Llama (LlamaForCausalLM)

  • Total Parameters: ~1M class footprint

  • Hidden Size (hidden_size): 234

  • A highly irregular dimension that is neither a power of two nor a multiple of 32. Any faulty memory padding or rigid SIMD kernel alignment assumptions will immediately trigger a segmentation fault or severe numerical corruption.

  • Number of Attention Heads (num_heads / num_kv_heads): **9 / 3**

  • Replicates the exact odd head count and non-power-of-two GQA layout (Group Size = 3) found in SmolLM2. This structure completely destabilizes head-loop or key-value indexing calculations that assume even splits or rely blindly on bit-shifting optimizations.

  • Head Dimension (head_dim): 26 ($234 \div 9$)

  • Selected as an even number to satisfy the internal tensor concatenation requirements of Hugging Face's RoPE execution, while remaining a highly atypical dimension relative to standard 64 or 128 sizes.

  • Individual FFN Internal Dimension (intermediate_size): 521 (Prime Number)

  • Utilizes a prime number configuration to deny any divisible memory padding shortcuts during matrix math execution.

  • Number of Hidden Layers (num_hidden_layers): 2


πŸ”€ Tokenizer Specifications

The tokenizer contains a maliciously structured sparse token distribution layout designed specifically to pin down and validate Jules' dynamic resizing implementation for non-contiguous index structures:

  • Tokenizer Type: GPT-2 Style Byte-Level BPE (llama-bpe mapping specification)

  • Base Vocabulary Size: 1,009 (Prime Number)

  • Sparse Added Tokens: Special control tokens are explicitly assigned to IDs widely separated from the core token sequence, forcing a large gap in the array mapping.

  • id: 2000 βž” <|im_start|>

  • id: 2001 βž” <s> (BOS token pushed to the very end)

  • id: 0 βž” </s> (EOS token pulled to the very beginning)

  • Conceptual Inversion: By mapping BOS to a high index (2001) and EOS to 0β€”inverting the conventional 1 or 2 assignmentsβ€”this setup aggressively breaks and exposes hardcoded token ID assumptions in custom pipeline logic.


πŸš€ Usage

Hugging Face / Transformers (Python)

import torch
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast

model_dir = "./tinygpt1m/hf"
tokenizer = PreTrainedTokenizerFast(tokenizer_file=f"{model_dir}/tokenizer.json")
model = LlamaForCausalLM.from_pretrained(model_dir)

prompt = "Once upon"
# Prepend 2001 (BOS) manually to match the specialized training distribution
input_ids = [2001] + tokenizer.encode(prompt, add_special_tokens=False)
input_tensor = torch.tensor([input_ids])

with torch.no_grad():
    output_ids = model.generate(
        input_tensor,
        max_length=32,
        do_sample=False,
        pad_token_id=0,
        bos_token_id=2001,
        eos_token_id=0
    )

print(tokenizer.decode(output_ids[0]))

llama.cpp / Custom Engine (GGUF)

# Verify behavior using llama.cpp completion binary
./llama-completion -m tinygpt1m.F32.gguf -p "Once upon" -n 20

πŸ“œ Acknowledgments

This validation model was successfully trained from scratch using TinyStories dataset. It acts as a permanent, zero-regression guardrail asset for evaluating attention kernels, dynamic token allocation arrays, and sampling routines across custom LLM hardware inference runtimes.

Downloads last month
24
GGUF
Model size
1.96M params
Architecture
llama
Hardware compatibility
Log In to add your hardware

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support