Benchmarks
Every modified model published by Schismatic is measured against the original it came from. The numbers below are extracted automatically from the benchmarks/ directory in the source repository.
What the columns mean
- Refusal (keyword) — out of 100 challenging prompts, how many the modified model still refuses, judged by a simple word-list scanner. Lower is better; 0% is the goal.
- Refusal (Qwen3Guard) — the same 100 prompts judged by a separate AI safety classifier. Stricter than the keyword scanner.
- KL@3 — how much the modified model drifts from the original on the first three generated words of normal text. Lower is better; 0 means identical.
- Coherence pass — fraction of replies that look like normal coherent English (no babbling or repetition).
Qwen3.6-27B
| Variant | Refusal (keyword) | Refusal (Qwen3Guard) | KL@1 | KL@3 | KL@8 | Coherence |
|---|---|---|---|---|---|---|
| schismatic-tpe ours | 0% | 100% | 0.0110 | 0.0713 | 0.0829 | 100% |
| youssofal-heretic | 0% | 100% | 0.0258 | 0.1753 | 0.1346 | 100% |
| aeon7-trial46 | 0% | 31% | 0.5791 | 0.2839 | 0.4363 | 100% |