LFM2-8B-A1B-GLM-4.7-Flash-Thinking-Quantum-IQ1C-P

Fine tune of "LFM2-8B-A1B" using Unsloth using custom dataset(s), 128k context in 16 bit precision.

This model is a sparse mixture of experts model (32) with 4 experts activated.

Speed exceeds 50-100 t/s on CPU // 200 t/s on most cards // 400 t/s + on 5090 at QUANT Q6K [4 experts].

One example generation below.

Can also be used on phones // mobile devices.

IN HOUSE BENCHMARKS [by Nightmedia]:

         arc-c arc/e boolq hswag obkqa piqa  wino

LFM2-8B-A1B-GLM-4.7-Flash-Thinking-Quantum-IQ1C-P
q8-hi    0.529,0.744,0.745,0.658,0.412,0.760,0.597

LFM2-8B-A1B-GLM-4.7-Flash-Thinking-Quantum-IQ1C
mxfp8    0.495,0.709,0.759,0.658,0.404,0.764,0.596

---

BASE UNTUNED MODEL:

LFM2-8B-A1B
mxfp8    0.460,0.575,0.829,0.624,0.394,0.711,0.567

EXAMPLE GENERATION: [4 experts, Q6K]

Downloads last month
133
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DavidAU/LFM2-8B-A1B-GLM-4.7-Flash-Thinking-Quantum-IQ1C-P

Finetuned
(28)
this model
Quantizations
3 models

Collections including DavidAU/LFM2-8B-A1B-GLM-4.7-Flash-Thinking-Quantum-IQ1C-P