ob1 is an RL env contributor.
Cite
Notes
Only stored in your browser.
Attribution
Reward hacking sprint environment for exact contrastive-reframing proxy pressure in helpfulness tasks.
A bilingual adversarial benchmark for auditing demographic safety alignment and jailbreak vulnerabilities in LLMs.
Verifiers port of BRFauna eval suite.
Generative question-answering environment based on MedPT dataset - Portuguese medical questions
Verifiers port for MedPT dataset
Wrapper that adapts any Verifiers env to the RLM interface.
Reinforcement Learning from Rendering Feedback (RLRF) environment. Image-to-SVG generation.
Verifiers wrapper for DatBench evaluation library
Image-to-SVG generation benchmark
A minimal subset of the PoETa v2 benchmark focusing on native Portuguese tasks (NLI, STS, QA, proverbs, toxicity).
A science question answering environment for evaluating scientific reasoning and problem-solving capabilities.
Evaluates multimodal document parsing (OCR, layout, formulas, tables) by converting document images to structured Markdown.Ported from OmniDocBench.
Verifiers environment for the Fox benchmark for fine-grained multi-page document understanding.
Evaluates a model's ability to perform base64 encoding and decoding across a variety of text and data formats.
A multi-turn agent environment from ACEBench that evaluates a model's ability to perform complex, sequential tool-use tasks to reach a correct fina...
A GIS-based environment where models classify geographic grid cells as land or water, evaluated on pixel accuracy and overall landmass IoU.
A single-turn reasoning environment based on the SimpleBench dataset, where models are evaluated on their ability to answer multiple-choice questions.