MiniCPM5-1B (LiteRT-LM)

This repository hosts the LiteRT-LM (LiteRT formerly known as TensorFlow Lite) version of MiniCPM5-1B, optimized for fully on-device inference on mobile and edge hardware.


Available Models

  • minicpm_dynamic_wi8_afp32_gpu_opt.litertlm: This model features dynamic weight-only INT8 quantization (wi8) with FP32 activations (afp32), heavily optimized for GPU execution.

What is MiniCPM?

MiniCPM5-1B is the first model in the MiniCPM5 series from OpenBMB. It is a dense 1B-parameter Transformer built specifically for on-device, local, and resource-constrained deployment, while reaching 1B-class open-source SOTA in its size class.

Highlights

  • πŸ† 1B-class open-source SOTA β€” strongest in tool use, code generation, and difficult reasoning among comparable open models.
  • 🧠 Hybrid Reasoning β€” a single checkpoint serves as both a fast assistant and a deliberate reasoner via a built-in <think> template (enable_thinking).
  • πŸ“ Long context β€” native 131,072-token context length.
  • πŸ“± Built for the edge β€” compact footprint designed for local assistants, coding agents, and tool-use workflows.

Model Information

Item Value
Type Causal Language Model
Architecture Standard LlamaForCausalLM
Parameters 1,080,632,832 (~1B)
Non-Embedding Parameters 679,552,512
Layers 24
Attention Heads (GQA) 16 (Q) / 2 (KV)
Context Length 131,072

Use the model

Android

Edge Gallery App

  • Download or build the app from GitHub.
  • Install the app from Google Play.
  • Follow the instructions in the app.

To build the demo app from source, please follow the instructions from the GitHub repository.

Try It (Desktop/CLI)

Install uv and run the model directly from the LiteRT-LM command line:

uv tool install litert-lm
uvx litert-lm run --from-huggingface-repo=litert-community/MiniCPM5-1B minicpm_dynamic_wi8_afp32_gpu_opt.litertlm --prompt="What is the capital of France?"

Links


License

Released under the Apache-2.0 License, consistent with the upstream openbmb/MiniCPM5-1B.

Citation

@article{minicpm4,
  title={MiniCPM4: Ultra-efficient LLMs on end devices},
  author={MiniCPM, Team},
  journal={arXiv preprint arXiv:2506.07900},
  year={2025}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for litert-community/MiniCPM5-1B

Finetuned
(25)
this model

Paper for litert-community/MiniCPM5-1B