meta-writing-style-detector
meta-writing-style-detector is a deterministic Verifiers environment for
probing whether small models can learn a lightweight theory of writing style.
The task is binary authorship/style detection: decide whether a sentence is
derived from Lewis Carroll's public-domain Alice's Adventures in Wonderland
or is a generated imitation about similar characters and situations.
The environment is deliberately cheap:
- no tools
- no sandbox
- no judge model
- deterministic source, imitation, and counterfeit generation
- compact JSON output
Prompts ask for exactly one result tag:
<result>{"label":"book","confidence":0.82}</result>
Allowed labels are book and imitation. Confidence must be a number from
0.0 to 1.0.
Metrics separate classification from formatting and calibration:
label_exactbook_correctimitation_correctpredicted_bookparseableexact_one_resultschema_validconfidence_in_rangecalibration_scoreoverconfident_wrongunderconfident_correctraw_jsonoutput_chars
Source text notes:
- Book sentences are manually selected and ASCII-normalized from the Project Gutenberg text of Alice's Adventures in Wonderland.
easy,balanced, andnear_missimitation sentences are deterministic local templates, not model-generated.counterfeitimitation sentences are harder negatives made by lightly altering or splicing real book sentences, so they preserve much more of the source style while no longer being exact copied text.- Future versions can replace the imitation generator with recorded LLM generations, while keeping the same parser, metrics, and eval matrix.