--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B library_name: mlx pipeline_tag: image-text-to-text thumbnail: dealign_mascot.png tags: - mlx - apple-silicon - abliterated - uncensored - crack - mtp - speculative-decoding - vision - video - reasoning - thinking - harmbench - mmlu - qwen3_6 - image-text-to-text - moe ---

Built for vMLX — the MLX inferencer with VL + video, KV-cache quantization, prefix-cache reuse, agentic tool calling, and native MTP speculative decoding.
_{Free for macOS · vmlx.net}

---

# Qwen 3.6 35B-A3B — MXFP4 CRACK + d3 MTP **CRACK abliterated** · **MXFP4 (4-bit microscaling)** · **d3 MTP self-speculative (1.51× faster)** · Vision + Video · Reasoning toggle · 18 GB

--- ## What Is This? This is [Qwen 3.6 35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a vision-language model (Mixture-of-Experts (256 routed, 10 active) hybrid SSM + full attention, 40 layers, native image + video understanding) that has been: 1. **CRACK abliterated** — refusal behavior removed at the weight level, so it complies across task categories instead of refusing, while keeping its knowledge, reasoning, and vision intact. 2. **MXFP4 (4-bit microscaling) quantized** for MLX on Apple Silicon — 18 GB. 3. **MTP-preserved** — the native multi-token-prediction head is kept and abliterated too, so **d3 self-speculative decoding works** (~1.51× faster) on an MTP-aware runtime (vMLX). Vision **and video** processing are fully preserved. ## Results Evaluated through the vMLX inference engine. HarmBench scored with a strict classifier (rejects loops, empty/template dumps, and thinking-trace leakage). MMLU is the standard 57-subject multiple-choice benchmark. | Metric | Result | |---|---:| | **HarmBench-320 (compliance / ASR)** | **99.4%** (318/320) | | **MMLU (57-subject)** | **74.6%** | | **d3 MTP speedup** | **1.51×** vs autoregressive | Abliteration preserves the model's knowledge and reasoning — it stays coherent in both direct and reasoning modes. ## Features - **Vision + video** — `image-text-to-text`, native frame/video understanding preserved. - **d3 MTP speculative decoding** — native MTP head preserved and abliterated → ~1.51× faster generation on an MTP-aware runtime. - **Reasoning toggle** — `enable_thinking=True` (default, full chain-of-thought) or `enable_thinking=False` (direct answers). ## Usage Run with [vMLX](https://vmlx.net) (recommended — supports VL + video + native MTP) or an MLX runtime with Qwen 3.6 support. Recommended sampling (from the model's `generation_config`): **temperature 1.0, top_p 0.95, top_k 20**. ```python # vMLX OpenAI-compatible endpoint # POST /v1/chat/completions { "model": "dealignai/Qwen3.6-35B-A3B-MXFP4-CRACK-MTP", "messages": [{"role": "user", "content": "..."}], "temperature": 1.0, "top_p": 0.95, "top_k": 20, "enable_thinking": true } ``` ## About CRACK **CRACK** (Controlled Refusal Ablation via Calibrated Knockouts) removes safety-refusal behavior at the weight level by projecting refusal directions out of the residual-stream writer matrices, with strengths calibrated to preserve reasoning quality and coherence. ## Support dealignai All models are built from original research and released free. **[Support us on Ko-fi](https://ko-fi.com/dealignai)** — membership gets early access and extras. [Ko-fi](https://ko-fi.com/dealignai) · [X @dealignai](https://x.com/dealignai) · [dealign.ai](https://dealign.ai) See our research: [Safety Generalization in Frontier Models](https://dealign.ai/quantsteer.html)

--- ## Disclaimer This model has had its safety-refusal behavior removed for research purposes. It will follow instructions across all categories without refusing. You are solely responsible for how you use it and for complying with all applicable laws. Published for AI-safety research and authorized security testing.