bioreason-pro-rl-reasoning-data
Description
bioreason-pro-rl-reasoning-data is an environment for evaluating protein function prediction using Gene Ontology (GO) term annotation, based on the [BioReason-Pro](https://github.com/bowang-lab/BioReason-Pro](https://huggingface.co/datasets/wanglab/bioreason-pro-rl-reasoning-data) dataset. Agents receive protein metadata including amino acid sequence, organism, InterPro domain annotations, protein-protein interaction partners, and initial GO term speculations from GO-GPT. Following the original BioReason-Pro reasoning pipeline, agents must reason about the protein's function and predict the correct set of GO terms. Scoring uses deterministic set-based F1, comparing predicted GO term IDs against ground truth annotations curated from UniProt with experimental evidence.
Capabilities
- Protein function prediction via Gene Ontology term annotation
- Multimodal biological reasoning across sequence, domain, and interaction data
- Hierarchical classification across three GO aspects (Molecular Function, Biological Process, Cellular Component)
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
Tasks are organized into four splits:
| Split | Description | Tasks |
|---|---|---|
train | All GO aspects combined | 9,197 |
mf | Molecular Function predictions only | 4,914 |
bp | Biological Process predictions only | 5,881 |
cc | Cellular Component predictions only | 5,610 |
Each task provides the agent with protein ID, name, organism, sequence, subcellular location, InterPro domain annotations, protein-protein interaction partners, and initial GO term speculations from GO-GPT. Following the original BioReason-Pro reasoning template, agents are asked to reason about the protein's function and predict GO term IDs for the specified aspect(s).
Reward Structure
This is a sparse, verifiable reward environment with deterministic scoring:
- Agent receives protein information and the GO aspect(s) to predict
- Agent submits predicted GO terms via the
answertool - GO term IDs (pattern
GO:XXXXXXX) are extracted from the response - Set-based F1 is computed against ground truth:
- Precision = |predicted ∩ truth| / |predicted|
- Recall = |predicted ∩ truth| / |truth|
- F1 = 2 * precision * recall / (precision + recall)
- Continuous reward in [0, 1]
No LLM grader is used.
Data
Data is sourced from the wanglab/bioreason-pro-rl-reasoning-data dataset containing 9,197 proteins with experimental GO annotations curated from UniProt. Each protein includes amino acid sequence, InterPro domain annotations, protein-protein interaction partners, and pre-computed GO-GPT predictions. Task data is stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
answer | Submit predicted GO terms for F1-based scoring |
Time Horizon
Single-turn. Each task is evaluated in a single interaction.
Environment Difficulty
BioReason-Pro model performance on GO term prediction (F_max):
| Model | F_max |
|---|---|
| BioReason-Pro RL (Qwen3-4B + ESM3 + GRPO) | 73.6% |
Other Environment Requirements
There are no further environment requirements. BioReason-Pro uses deterministic F1 scoring and does not require any external API keys.
Safety
Agents in BioReason-Pro answer protein function prediction questions in a standard environment. The environment does not provide tools for code execution, web access, or file system modification.
Protein function prediction is a dual-use capability. Accurate functional annotation of proteins could in principle be applied to characterize proteins involved in pathogenicity, toxin production, or antimicrobial resistance. However, the tasks in this environment operate on publicly available UniProt annotations and GO terms, and the predictions are limited to categorical ontology labels rather than actionable design instructions. The risk profile is comparable to querying existing protein databases.
Citation
@article{fallahpour2026bioreason_pro,
title={BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning},
author={Fallahpour, Adibvafa and Seyed-Ahmadi, Arman and Idehpour, Parsa and Ibrahim, Omar and Gupta, Purav and Naimer, Jack and Zhu, Kevin and Shah, Arnav and Ma, Shihao and Adduri, Abhinav and G{\"u}loglu, Talu and Liu, Nuo and Cui, Haotian and Jain, Arihant and de Castro, Max and Fallahpour, Amirfaham and Cembellin-Prieto, Antonio and Stiles, John S. and Nem{\v{c}}ko, Filip and Nevue, Alexander A. and Moon, Hyungseok C. and Sosnick, Lucas and Markham, Olivia and Duan, Haonan and Lee, Michelle Y. Y. and Salvador, Andrea F. M. and Maddison, Chris J. and Thaiss, Christoph A. and Ricci-Tam, Chiara and Plosky, Brian S. and Burke, Dave P. and Hsu, Patrick D. and Goodarzi, Hani and Wang, Bo},
journal={bioRxiv},
year={2026},
doi={10.64898/2026.03.19.712954}
}