scientific-literature-review

This environment tests an agent's ability to perform a basic scientific literature review, identifying key papers, extracting relevant information, and synthesizing findings. It evaluates the agent's proficiency in navigating scientific databases and understanding research abstracts.

Overview

Domain: science Base Class: ToolEnv Difficulty: medium Task: The model must identify relevant scientific papers based on a given research question, extract specific data points (e.g., methodology, key findings), and summarize the collective insights from the selected literature.

Quickstart

Installation

uv run vf-install scientific-literature-review

Usage

import verifiers as vf

env = vf.load_environment("scientific-literature-review")
results = env.evaluate_sync(
    client=vf.OpenAI(),
    model="gpt-4.1-mini",
    num_examples=10,
    rollouts_per_example=1
)

Evaluation

Run an evaluation with default settings:

uv run vf-eval scientific-literature-review

Configure model and sampling:

uv run vf-eval scientific-literature-review \
  -m gpt-4.1-mini \
  -n 20 -r 3 -t 1024 -T 0.7

Environment Arguments

Arg	Type	Default	Description
`num_examples`	int	1000	Number of training examples
`num_eval_examples`	int	100	Number of evaluation examples
`seed`	int	42	Random seed for reproducibility

Metrics

Metric	Meaning
`reward`	Primary reward signal
`format_reward`	Format adherence reward (if applicable)

About

Generated by synthetic-rl-env-creator.

Tags: science