Benchmarks

Every modified model published by Schismatic is measured against the original it came from. The numbers below are extracted automatically from the benchmarks/ directory in the source repository.

What the columns mean

Qwen3.6-27B

Base model: Qwen/Qwen3.6-27B · 100 prompts · corpus: mlabonne_harmful_behaviors

Variant Refusal (keyword) Refusal (Qwen3Guard) KL@1 KL@3 KL@8 Coherence
schismatic-tpe ours 0% 100% 0.0110 0.0713 0.0829 100%
schismatic-tpe-multifidelity ours 0% 20% 0.0757 0.3764 0.4278 100%
schismatic-qnehvi ours 0% 24% 0.0831 0.6185 0.5310 99%
schismatic-qnehvi-multifidelity ours 0% 19% 0.1293 0.9692 0.5124 100%
schismatic-nsga2 ours 0% 18% 0.1389 2.7065 0.5835 99%
youssofal-heretic 0% 100% 0.0258 0.1753 0.1346 100%
aeon7-trial46 0% 31% 0.5791 0.2839 0.4363 100%

Detailed results & sample outputs →