---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
library_name: mlx
pipeline_tag: image-text-to-text
thumbnail: dealign_mascot.png
tags:
- mlx
- apple-silicon
- abliterated
- uncensored
- crack
- mtp
- speculative-decoding
- vision
- video
- reasoning
- thinking
- harmbench
- mmlu
- qwen3_6
- image-text-to-text
- moe
---
---

# Qwen 3.6 35B-A3B — MXFP4 CRACK + d3 MTP
**CRACK abliterated** · **MXFP4 (4-bit microscaling)** · **d3 MTP self-speculative (1.51× faster)** · Vision + Video · Reasoning toggle · 18 GB
---
## What Is This?
This is [Qwen 3.6 35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a vision-language model
(Mixture-of-Experts (256 routed, 10 active) hybrid SSM + full attention, 40 layers, native image + video understanding) that has been:
1. **CRACK abliterated** — refusal behavior removed at the weight level, so it complies across
task categories instead of refusing, while keeping its knowledge, reasoning, and vision intact.
2. **MXFP4 (4-bit microscaling) quantized** for MLX on Apple Silicon — 18 GB.
3. **MTP-preserved** — the native multi-token-prediction head is kept and abliterated too, so
**d3 self-speculative decoding works** (~1.51× faster) on an MTP-aware runtime (vMLX).
Vision **and video** processing are fully preserved.
## Results
Evaluated through the vMLX inference engine. HarmBench scored with a strict classifier
(rejects loops, empty/template dumps, and thinking-trace leakage). MMLU is the standard 57-subject
multiple-choice benchmark.
| Metric | Result |
|---|---:|
| **HarmBench-320 (compliance / ASR)** | **99.4%** (318/320) |
| **MMLU (57-subject)** | **74.6%** |
| **d3 MTP speedup** | **1.51×** vs autoregressive |
Abliteration preserves the model's knowledge and reasoning — it stays coherent in both direct
and reasoning modes.
## Features
- **Vision + video** — `image-text-to-text`, native frame/video understanding preserved.
- **d3 MTP speculative decoding** — native MTP head preserved and abliterated → ~1.51× faster
generation on an MTP-aware runtime.
- **Reasoning toggle** — `enable_thinking=True` (default, full chain-of-thought) or
`enable_thinking=False` (direct answers).
## Usage
Run with [vMLX](https://vmlx.net) (recommended — supports VL + video + native MTP) or an MLX
runtime with Qwen 3.6 support.
Recommended sampling (from the model's `generation_config`): **temperature 1.0, top_p 0.95,
top_k 20**.
```python
# vMLX OpenAI-compatible endpoint
# POST /v1/chat/completions
{
"model": "dealignai/Qwen3.6-35B-A3B-MXFP4-CRACK-MTP",
"messages": [{"role": "user", "content": "..."}],
"temperature": 1.0, "top_p": 0.95, "top_k": 20,
"enable_thinking": true
}
```
## About CRACK
**CRACK** (Controlled Refusal Ablation via Calibrated Knockouts) removes safety-refusal behavior
at the weight level by projecting refusal directions out of the residual-stream writer matrices,
with strengths calibrated to preserve reasoning quality and coherence.
## Support dealignai
All models are built from original research and released free.
**[Support us on Ko-fi](https://ko-fi.com/dealignai)** — membership gets early access and extras.
[Ko-fi](https://ko-fi.com/dealignai) · [X @dealignai](https://x.com/dealignai) · [dealign.ai](https://dealign.ai)
See our research: [Safety Generalization in Frontier Models](https://dealign.ai/quantsteer.html)
---
## Disclaimer
This model has had its safety-refusal behavior removed for research purposes. It will follow
instructions across all categories without refusing. You are solely responsible for how you use
it and for complying with all applicable laws. Published for AI-safety research and authorized
security testing.