0

Arousal EVAL RL Env (Openfarm)

Fresh

OpenFARM visual affect benchmark for expert-coded zoo animal arousal and valence

Type
RL Env
Publisher
Openfarm
Runtime
single-turn
License
unknown
Size
v0.1.0
Published
May 2026

Cite

Notes

Only stored in your browser.

openfarm-zoo-arousal-eval

GitHub Prime Intellect Environments Hub Hugging Face Dataset


Overview

OpenFARM Zoo Arousal is a tiny visual affect eval built from expert-labeled zoo-animal video stimuli. It tests whether multimodal models can classify expert-coded arousal and/or valence from visible behavior, posture, movement, and facial/body cues.

  • Environment ID: openfarm-zoo-arousal-eval
  • Type: single-turn classification / EnvGroup when multiple tasks are selected
  • Default modality: filmstrip image
  • Other modalities: video, frames, text
  • Output format: XML answer, with optional explanation
  • Primary metric: exact normalized answer reward

The headline benchmark is visual-only. The prepared dataset uses muted clips because the source study frames the task as visual recognition from mute video clips. Source audio was audited during prep and should not be treated as the scientific signal for this env.

Dataset

Tasks

TaskLabel
arousallow / high
valencenegative / neutral / positive
valence_arousalnegative_high / neutral_low / positive_low / positive_high

Quickstart

prime eval run openfarm-zoo-arousal-eval \
  -a '{"task": "arousal", "modality": "filmstrip", "max_examples": 5}'

Run valence and arousal as an EnvGroup:

prime eval run openfarm-zoo-arousal-eval \
  -a '{"task": ["arousal", "valence"], "modality": "filmstrip"}'

Give the model nine separate sampled image inputs instead of one montage:

prime eval run openfarm-zoo-arousal-eval \
  -a '{"task": "valence_arousal", "modality": "frames"}'

Use the muted prepared video directly on endpoints that support video input:

prime eval run openfarm-zoo-arousal-eval \
  -a '{"task": "valence_arousal", "modality": "video"}'

Environment Arguments

ArgTypeDefaultDescription
taskstr/list"arousal"arousal, valence, valence_arousal, a list, or "all".
dataset_idstr"oliveirabruno01/openfarm-zoo-valence-arousal"Hugging Face dataset ID.
dataset_revisionstr/nullnullOptional dataset revision.
test_splitstr"test"Eval split. This dataset is eval-only.
max_examplesint-1Optional subsampling budget.
seedint42Shuffle seed before subsampling.
modalitystr"filmstrip"filmstrip, video, frames, or text.
include_species_contextboolfalseAdds species as prompt text. Kept off by default for a purer visual task.
require_explanationboolfalseRequires an <explanation> field before <answer>.
format_reward_weightfloat0.0Optional XML format reward weight.

Dataset Notes

  • The dataset is intentionally eval-only; there is no meaningful train split.
  • The public rows use opaque media filenames and omit source filenames, clip codes, segment timestamps, expert notes, pre-rendered messages, task ids, and OpenFARM-specific row IDs.
  • video mode sends the muted prepared MP4 clip from embedded HF Video bytes. It is endpoint-dependent and intentionally has no local file fallback.
  • filmstrip mode sends the 3x3 montage as one image. It is the most portable vision path across current multimodal endpoints.
  • frames mode splits that same 3x3 filmstrip into nine separate image inputs, ordered left-to-right and top-to-bottom. This is useful for models such as Gemma 4 that can attend to multiple input images.

Metrics

MetricMeaning
accuracy_reward1.0 when the parsed answer matches the target label after normalization.
format_rewardOptional XML-format reward when format_reward_weight > 0.