meta-writing-style-detector

meta-writing-style-detector is a deterministic Verifiers environment for probing whether small models can learn a lightweight theory of writing style. The task is binary authorship/style detection: decide whether a sentence is derived from Lewis Carroll's public-domain Alice's Adventures in Wonderland or is a generated imitation about similar characters and situations.

The environment is deliberately cheap:

no tools
no sandbox
no judge model
deterministic source, imitation, and counterfeit generation
compact JSON output

Prompts ask for exactly one result tag:

<result>{"label":"book","confidence":0.82}</result>

Allowed labels are book and imitation. Confidence must be a number from 0.0 to 1.0.

Metrics separate classification from formatting and calibration:

label_exact
book_correct
imitation_correct
predicted_book
parseable
exact_one_result
schema_valid
confidence_in_range
calibration_score
overconfident_wrong
underconfident_correct
raw_json
output_chars

Source text notes:

Book sentences are manually selected and ASCII-normalized from the Project Gutenberg text of Alice's Adventures in Wonderland.
easy, balanced, and near_miss imitation sentences are deterministic local templates, not model-generated.
counterfeit imitation sentences are harder negatives made by lightly altering or splicing real book sentences, so they preserve much more of the source style while no longer being exact copied text.
Future versions can replace the imitation generator with recorded LLM generations, while keeping the same parser, metrics, and eval matrix.