Bundled chat_template.jinja is chat-only — strips tools silently

by Raullen - opened Apr 29

MLX Community org Apr 29

Heads up that the chat_template.jinja shipped in this repo (and across the V4-Flash quant variants) only renders system/user/assistant messages — there's no branch for the tool role, no iteration over the tools array, and no <tool_call> markers. So when an OpenAI-compatible client passes tools=[...], the array is silently dropped by apply_chat_template and the model never knows tools were available.

We picked this up while shipping day-0 V4 support in rapid-mlx (Apple Silicon MLX backend, PR #168). Plain chat works perfectly on both 2-bit DQ and 8-bit on a Mac Studio M3 Ultra (56/31 tok/s decode respectively, 7/8 stress scenarios pass), but our 30-scenario tool-calling eval scored 0/30 — every scenario logs tool_detected: False. Same outcome with Hermes and OpenClaude agent profiles.

Not a quant issue (identical 0/30 on 2-bit and 8-bit) and not a parser issue — the model literally never sees the tools list. Verified by inspecting the rendered prompt.

There's an active PR #16 upstream on deepseek-ai/DeepSeek-V4-Flash (by @Rocketknight1 , HF staff) adding a tool-supporting template, with a follow-up alternative @qgallouedec proposed. Would it be possible to pull whichever variant lands into the V4-Flash quant repos so users get tool calling out of the box?

Happy to test + report numbers once an updated template lands.

Thanks for the great quant work — the model itself runs beautifully on Apple Silicon.

pcuenq

MLX Community org Apr 29

•

edited Apr 29

I have tested the version shared by @Rocketknight1 in that PR, and it passes the same tool calling tests as the custom encoding code.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment