0

Horse Grimace RL Env (Openfarm)

Fresh

Horse Grimace Scale-style facial-region pain scoring benchmark.

Type
RL Env
Publisher
Openfarm
Runtime
single-turn
License
unknown
Size
v0.1.0
Published
May 2026

Cite

Notes

Only stored in your browser.

openfarm-horse-grimace

GitHub Prime Intellect Environments Hub Hugging Face Dataset


Overview

  • Environment ID: openfarm-horse-grimace
  • Short description: Horse Grimace Scale-style facial-region scoring from equine face crops.
  • Tags: animal-welfare, horse, equine, facial-expression, grimace-scale, pain-assessment, vision, eval

Datasets

Task

  • Type: single-turn multimodal image classification
  • Default task: region_score
  • Supported tasks:
    • region_score: answer 0, 1, or 2 for an HGS-style facial-region cue.
    • binary_pain: answer 0 or 1, derived from source scores (0 = no cue, 1/2 = cue present).
  • Output format: XML tags. Depending on require_explanation, models must output either <answer>...</answer> or both <explanation>...</explanation> and <answer>...</answer>.
  • Rubric: exact-match answer reward, optional XML format reward, and optional grounded reasoning judge against an equine grimace-scale cheatsheet.

Quickstart

prime eval run openfarm-horse-grimace

Configure task and split:

prime eval run openfarm-horse-grimace \
  -m google/gemini-3-flash-preview \
  -n 20 -r 3 \
  -a '{"task": "region_score", "test_split": "test", "require_explanation": true}'

Environment Arguments

ArgTypeDefaultDescription
dataset_idstr"oliveirabruno01/openfarm-horse-grimace-region"Hugging Face dataset ID.
dataset_revisionstr/nullNoneOptional dataset revision.
test_splitstr"test"Split to evaluate.
taskstr"region_score"One of region_score or binary_pain.
max_examplesint-1Limit examples after shuffling.
max_examples_per_taskint/nullNoneCompatibility alias for max_examples.
seedint42Dataset shuffle seed.
include_region_contextbooltrueInclude the source crop region in the prompt.
require_explanationbooltrueRequire an explanation before the answer.
require_reasoningbool/nullNoneCompatibility alias for require_explanation.
accuracy_reward_weightfloat1.0Exact-match reward weight.
judge_reward_weightfloat1.0Optional grounded-judge reward weight.
format_reward_weightfloat0.0XML format reward weight.
reward_weightsdict/nullNoneOptional aliases: accuracy, judge, format or full reward names.
judge_modestr"none"none, self, or external.
judge_modelstr"gpt-4o-mini"Model used when judge_mode="external".
system_prompt_overridestr/nullNoneReplace the default system prompt.
user_prompt_overridestr/nullNoneReplace the default user prompt template.
cheatsheet_versionstr"full"full or short for judge grounding.
cheatsheet_overridestr/nullNoneReplace the default judge cheatsheet.

Metrics

MetricMeaning
rewardWeighted scalar reward.
accuracy_rewardExact match after answer normalization.
format_rewardXML format reward when enabled.
judge_rewardOptional biological-reasoning score against the equine cheatsheet.