scientific-literature-review
This environment tests an agent's ability to perform a basic scientific literature review, identifying key papers, extracting relevant information, and synthesizing findings. It evaluates the agent's proficiency in navigating scientific databases and understanding research abstracts.
Overview
Domain: science Base Class: ToolEnv Difficulty: medium Task: The model must identify relevant scientific papers based on a given research question, extract specific data points (e.g., methodology, key findings), and summarize the collective insights from the selected literature.
Quickstart
Installation
uv run vf-install scientific-literature-review
Usage
import verifiers as vf
env = vf.load_environment("scientific-literature-review")
results = env.evaluate_sync(
client=vf.OpenAI(),
model="gpt-4.1-mini",
num_examples=10,
rollouts_per_example=1
)
Evaluation
Run an evaluation with default settings:
uv run vf-eval scientific-literature-review
Configure model and sampling:
uv run vf-eval scientific-literature-review \
-m gpt-4.1-mini \
-n 20 -r 3 -t 1024 -T 0.7
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
num_examples | int | 1000 | Number of training examples |
num_eval_examples | int | 100 | Number of evaluation examples |
seed | int | 42 | Random seed for reproducibility |
Metrics
| Metric | Meaning |
|---|---|
reward | Primary reward signal |
format_reward | Format adherence reward (if applicable) |
About
Generated by synthetic-rl-env-creator.
Tags: science