czech-beer-brand-name

Overview

Environment ID: czech-beer-brand-name
Short description: Predict a Czech beer brand from bottle appearance descriptions.
Tags: czech, beer, classification, vision, ocr

Task

Type: single-turn classification
Output format expectations: reasoning allowed, final brand must be inside <guess>...</guess>
Rubric:

brand_exact_match_weighted (weight 1.0): exact canonical match using aliases, scaled by description level difficulty.
brand_exact_match (metric only): unweighted exact match for plain accuracy tracking.
format_reward (weight 0.05): requires a valid XML answer tag format (default <guess>...</guess>).

No Hardcoded Brands

Brand names are not defined in code.
The environment derives the full brand set from dataset_path (and eval_dataset_path if provided).
Optional aliases can be provided per row via aliases.

Image Dataset Workflow

Put bottle images in:

environments/czech_beer_brand_name/data/images/

Use either:

Filename labels (quick start):

pilsner_urquell_01.jpg
kozel_12.png

Explicit labels file (labels.jsonl):

one row per image with image_path and brand
copy template from environments/czech_beer_brand_name/data/labels.template.jsonl

Generate OCR descriptions (bottle/logo/distinct features, no taste):

export OPENROUTER_API_KEY=your_key_here

python environments/czech_beer_brand_name/scripts/build_image_ocr_dataset.py \
  --images-dir environments/czech_beer_brand_name/data/images \
  --labels-jsonl environments/czech_beer_brand_name/data/labels.jsonl \
  --out environments/czech_beer_brand_name/data/beer_bottles_ocr.jsonl \
  --model minimax/minimax-m2

Output row format:

required: brand, description
optional: description_short, description_brief, description_minimal
optional: aliases (list), image_path, source, label_method, id

Generate 3 or 4 description levels from existing full descriptions:

export OPENROUTER_API_KEY=your_key_here

python environments/czech_beer_brand_name/scripts/generate_description_levels.py \
  --in environments/czech_beer_brand_name/data/beer_bottles_ocr.jsonl \
  --out environments/czech_beer_brand_name/data/beer_bottles_ocr_levels.jsonl \
  --model openai/gpt-4.1-mini \
  --num-levels 3

Environment Arguments

Arg	Type	Default	Description
`dataset_path`	str	required	JSONL with labeled OCR rows
`eval_dataset_path`	str	`""`	optional separate eval JSONL
`eval_size`	int	`40`	split size if no `eval_dataset_path`
`include_candidates`	bool	`true`	include candidate brand list in prompt
`seed`	int	`7`	deterministic split/shuffle seed
`description_levels`	str	`"detailed,short,brief"`	comma-separated levels to include (`detailed,short,brief,minimal`)
`detailed_level_reward`	float	`1.0`	reward multiplier for detailed descriptions
`short_level_reward`	float	`1.0`	reward multiplier for short descriptions
`brief_level_reward`	float	`1.0`	reward multiplier for brief descriptions
`minimal_level_reward`	float	`1.0`	reward multiplier for minimal descriptions
`wrong_answer_penalty`	float	`-0.1`	reward returned when predicted brand is wrong or unparsable
`format_reward_weight`	float	`0.0`	rubric weight for XML format reward metric
`answer_tag`	str	`"guess"`	XML tag used for final answer (model must answer in `<tag>...</tag>`)

Quickstart

prime eval run czech-beer-brand-name -m gpt-4.1-mini -n 20 -r 1 \
  -a '{"dataset_path":"environments/czech_beer_brand_name/data/beer_bottles_ocr.jsonl","eval_size":8}' -s