Instructions to use stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP

Run Hermes

hermes

MLX LM

How to use stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen3.6-35B-A3B Claude 4.7 Opus Reasoning Distilled - MLX oQ4 MTP

MLX/oMLX 4-bit conversion of r3lax/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled with Qwen MTP tensors preserved and runtime-tested in oMLX.

This is not a new training run or fine-tune. Weights were only converted/quantized for local MLX/oMLX inference.

Quick Facts

Architecture: Qwen3.6 35B-A3B MoE, roughly 3B active parameters per token.
Quantization: oQ4-style MLX 4-bit, group size 64.
MTP: preserved and verified in oMLX.
Test hardware: Apple Silicon M5 Pro with 48GB unified memory.
Runtime tested with oMLX native MTP enabled.

MTP Verification

MTP config: mtp_num_hidden_layers=1, mtp_use_dedicated_embeddings=false
MTP tensor entries: 42
MTP fusion projection: language_model.mtp.fc.weight present and full precision
Runtime: oMLX logged native MTP path activation during smoke tests

MTP support is runtime-specific. These tensors are preserved, but non-oMLX runtimes may ignore them.

Local Speed Smoke Test

Measured on an M5 Pro Mac with 48GB unified memory using oMLX.

MTP on average:  ~89.9 tok/s
MTP off average: ~86.0 tok/s
Speed lift:       ~+4.6% with native MTP enabled

Per-prompt MTP-on smoke results:

count120:   160 tokens, 1.726s, 92.69 tok/s, MTP accept 79/79 and 79/79
rain180:    220 tokens, 2.465s, 89.26 tok/s, MTP accept 96/123 and 98/121
jsoncities: 128 tokens, 1.457s, 87.86 tok/s, MTP accept 64/65 and 64/64

These are local smoke numbers, not a universal benchmark. Prompt, cache state, batching, oMLX version, and hardware will change results.

Usage Notes

Place the model under your oMLX model directory and enable native MTP:

~/.omlx/models/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP

Recommended oMLX settings:

{
  "mtp_enabled": true,
  "dflash_enabled": false,
  "turboquant_kv_enabled": false
}

Attribution

Credit to the upstream work:

Qwen/Qwen3.6-35B-A3B for the base model.
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled for the Claude Opus 4.7 reasoning distillation.
r3lax/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled for the source checkpoint used here.
Anthropic Claude Opus 4.7 as the teacher model used by the upstream distillation.
oMLX / MLX community for the Apple Silicon runtime and MTP support.

Please credit the upstream authors if you use or redistribute this conversion.

Caveats

MTP runtime activation was verified in oMLX only.
This conversion is experimental.
Reasoning models can emit long <think> traces; set max_tokens intentionally.
The upstream model card notes that distillation transfers reasoning style, not new factual knowledge.

License

Apache-2.0, following the upstream model metadata. Check upstream model cards and any teacher-data usage policies before redistribution or commercial deployment.

Downloads last month: 8,104

Safetensors

Model size

35B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for stamsam/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-MLX-oQ4-MTP

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(418)

this model