openfarm-horse-grimace
Overview
- Environment ID:
openfarm-horse-grimace - Short description: Horse Grimace Scale-style facial-region scoring from equine face crops.
- Tags: animal-welfare, horse, equine, facial-expression, grimace-scale, pain-assessment, vision, eval
Datasets
- Primary dataset:
oliveirabruno01/openfarm-horse-grimace-region - Source: Mendeley Data, "Automatic Pain Assessment in Horses" (
10.17632/t8rtzcgwxm.3), CC-BY-4.0. - Related paper: "Pain assessment in horses using automatic facial expression recognition through deep learning-based modeling" (
10.1371/journal.pone.0258672). - Default split:
test
Task
- Type: single-turn multimodal image classification
- Default task:
region_score - Supported tasks:
region_score: answer0,1, or2for an HGS-style facial-region cue.binary_pain: answer0or1, derived from source scores (0= no cue,1/2= cue present).
- Output format: XML tags. Depending on
require_explanation, models must output either<answer>...</answer>or both<explanation>...</explanation>and<answer>...</answer>. - Rubric: exact-match answer reward, optional XML format reward, and optional grounded reasoning judge against an equine grimace-scale cheatsheet.
Quickstart
prime eval run openfarm-horse-grimace
Configure task and split:
prime eval run openfarm-horse-grimace \
-m google/gemini-3-flash-preview \
-n 20 -r 3 \
-a '{"task": "region_score", "test_split": "test", "require_explanation": true}'
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
dataset_id | str | "oliveirabruno01/openfarm-horse-grimace-region" | Hugging Face dataset ID. |
dataset_revision | str/null | None | Optional dataset revision. |
test_split | str | "test" | Split to evaluate. |
task | str | "region_score" | One of region_score or binary_pain. |
max_examples | int | -1 | Limit examples after shuffling. |
max_examples_per_task | int/null | None | Compatibility alias for max_examples. |
seed | int | 42 | Dataset shuffle seed. |
include_region_context | bool | true | Include the source crop region in the prompt. |
require_explanation | bool | true | Require an explanation before the answer. |
require_reasoning | bool/null | None | Compatibility alias for require_explanation. |
accuracy_reward_weight | float | 1.0 | Exact-match reward weight. |
judge_reward_weight | float | 1.0 | Optional grounded-judge reward weight. |
format_reward_weight | float | 0.0 | XML format reward weight. |
reward_weights | dict/null | None | Optional aliases: accuracy, judge, format or full reward names. |
judge_mode | str | "none" | none, self, or external. |
judge_model | str | "gpt-4o-mini" | Model used when judge_mode="external". |
system_prompt_override | str/null | None | Replace the default system prompt. |
user_prompt_override | str/null | None | Replace the default user prompt template. |
cheatsheet_version | str | "full" | full or short for judge grounding. |
cheatsheet_override | str/null | None | Replace the default judge cheatsheet. |
Metrics
| Metric | Meaning |
|---|---|
reward | Weighted scalar reward. |
accuracy_reward | Exact match after answer normalization. |
format_reward | XML format reward when enabled. |
judge_reward | Optional biological-reasoning score against the equine cheatsheet. |