0

Bioreasoning Phenotype RL Env (Abugoot)

Fresh

Multi-step pharmacology reasoning chain: predict target → MoA → pathways+direction → one phenotype (viability/cell_cycle/stress/magnitude) across t...

Type
RL Env
Publisher
Abugoot
License
unknown
Size
v0.12.0
Published
May 2026

Cite

Notes

Only stored in your browser.

bioreasoning_phenotype

Pharmacology reasoning-chain env on small-molecule perturbations. The model is given a SMILES + cell line + assay context and must reason from chemical structure through to a cellular phenotype, in graded intermediate steps. Each step grades a different scientific question; rewards combine into a weighted aggregate. Optional tools give the model non-outcome lookup support for compound identity and Hallmark pathway ontology.

Current package version: 0.10.0. Hub env: abugoot/bioreasoning_phenotype. This README describes the environment interface and data construction. Project history and research results live in REPORT.md.

Chain

Full-chain examples ask for three upstream steps + one downstream phenotype (varies per example). Standalone curriculum lanes ask for one selected step. The grader only checks the answer tags — free-form reasoning between/around tags is welcomed but not enforced.

StepAnswer tagGrade againstMetric
1. Target<TARGET>SYM1|SYM2</TARGET>Drug Repurposing Hub target columnset F1
2. MoA<MOA>HDAC inhibitor</MOA>DRH moa columnnormalized exact match
3. Pathways<PATHWAYS>HALLMARK_X:up, HALLMARK_Y:down, ...</PATHWAYS>Top-5 Hallmark signed enrichment from L1000 MODZset F1 on (name, direction) tuples
4. Phenotype<<P>>label</<P>>per-phenotype GT (see below)per-phenotype scorer

Downstream phenotype is one of (selected per example):

PhenotypeTagTypeLabels
viability<VIABILITY>continuous LFCpiecewise-linear on |err|; full credit ±0.25, zero ≥ ±2.0
cell_cycle<CELL_CYCLE>3-classarrest / no_effect / proliferation
stress<STRESS>4-classnone / apoptosis / UPR / DNA_damage
magnitude<MAGNITUDE>3-classinert / moderate / strong

Default reward weights: target 0.15 / moa 0.15 / pathways 0.25 / phenotype 0.45. The aggregate only counts steps requested by the entry point (so e.g. from_pathways examples weight only the phenotype). format_compliance is a tracked metric, not part of the reward.

Data substrate

  • LINCS L1000 Level 5 MODZ signatures over 6 cell lines (A375, A549, HEPG2, HT29, MCF7, PC3) — used for DEG ranking, signed Hallmark enrichment, transcriptional magnitude (‖z‖₂), cell-cycle / stress / magnitude bucket labels
  • PRISM Repurposing 24Q2 Extended Primary — continuous viability LFC per (compound × cell), averaged over full Broad sample IDs when the same 13-character compound ID appears multiple times
  • Drug Repurposing Hub (clue.io) — pert_iname, target, MoA, SMILES, InChIKey, PubChem CID
  • MSigDB Hallmark v2024.1 — 50 gene sets for pathway enrichment

Joined at the molecule/cell-line level: Drug Repurposing Hub samples collapse to the 13-character Broad compound ID, LINCS uses that short pert_id, PRISM full Broad IDs are averaged by the same short ID, and DepMap ACH IDs map PRISM cell lines to LINCS short cell-line codes. Exact physical-sample matching between LINCS and PRISM is only available for a tiny fraction of pairs, so the env does not treat sample/vendor/batch identity as model-visible context.

chain_gt contains 6,276 (compound × cell-line) pairs; 6,270 have the full upstream chain required for examples. Examples are skinny (one phenotype per sample) and include full-chain, ablation, and standalone curriculum entry points, ~90% train / 10% test split by compound hash (so train/test never share compounds).

v0.10 prompt/data notes:

  • Prompts explicitly separate the LINCS expression assay context (dose_um, time_h) from the PRISM viability assay context (dose, 5-day endpoint, same compound/cell line but separate assay).
  • Viability prompts no longer say the PRISM LFC is "under this treatment condition", because that wrongly conflated the LINCS expression condition with the PRISM viability protocol.
  • Packaged examples carry audit/provenance columns for lincs_dose_um, lincs_time_h, prism_dose_um, prism_duration_h, prism_screens, prism_n_full_ids, and prism_aggregation.
  • Drug Repurposing Hub has multiple physical sample rows per short Broad ID, but retained compounds have unique pert_iname, target, MoA, and almost always a unique SMILES. The existing first-row collapse is therefore treated as a molecule-level convenience, not as physical sample provenance.
  • The main chain prompt does not include an inline Hallmark menu or extra concise/exact-tag instructions.
  • hallmark_tools=True remains available as an optional ontology-tool ablation, but the primary compound-tool setting leaves it disabled.

Entry points

Pre-fill upstream steps to measure scaffolding sensitivity:

  • smiles_only — model starts from SMILES, predicts all 4 steps (the RL training setting)
  • from_target — target is given as context, model predicts MoA / pathways / phenotype
  • from_moa — target + MoA given, model predicts pathways / phenotype
  • from_pathways — target + MoA + pathways given, model predicts phenotype only
  • phenotype_direct — no chain at all, just SMILES + cell + "predict phenotype"
  • target_from_smiles — standalone target prediction from SMILES
  • moa_from_smiles — standalone MoA prediction from SMILES
  • moa_from_target — standalone MoA prediction with target given
  • pathways_from_smiles — standalone signed-Hallmark prediction from SMILES
  • pathways_from_moa — standalone signed-Hallmark prediction with target + MoA given
  • phenotype_from_moa — phenotype prediction with target + MoA given
  • phenotype_from_pathways — phenotype prediction with signed pathways given

The first five are the original full-chain/scaffold diagnostic entry points. The standalone lanes were added for curriculum phases that refresh a specific part of the chain without requiring the model to solve every upstream step in the same rollout.

load_environment args

load_environment(
    entry_points=None,          # list[str] | None — default: all entry points
    phenotypes=None,            # list[str] | None — default: all four
    cell_lines=None,            # list[str] | None — default: all 6
    num_train_examples=-1,      # int — -1 = all, else downsample
    num_eval_examples=-1,
    reward_weights=None,        # dict[str, float] — default below
    tools=False,                # if True, expose identify_compound via ToolEnv
    hallmark_tools=False,       # if True, expose describe_hallmark via ToolEnv
    max_tool_turns=5,           # bound tool-call rounds when any tool is enabled
)

Default reward weights:

{"target": 0.15, "moa": 0.15, "pathways": 0.25, "phenotype": 0.45}

Rubric

One reward function + eight tracked metrics:

FunctionTypeRange
aggregate_rewardreward (weight 1.0)0–1 weighted average over requested steps
target_f1metric0–1 set F1 on gene symbols
moa_accuracymetric0 or 1 normalized exact match
pathway_signed_f1metric0–1 set F1 on (name, direction) tuples
pathway_name_validitymetricfraction of parsed pathway predictions using exact canonical Hallmark names
pathway_name_f1metric0–1 set F1 on pathway names, ignoring direction
pathway_direction_accuracymetricdirection accuracy among exact pathway-name overlaps
phenotype_scoremetricper-phenotype (LFC piecewise-linear, else exact match)
format_compliancemetric0–1 fraction of requested answer tags present

Tools (optional)

tools=True exposes the compound lookup:

identify_compound(smiles: str) -> {
    "exact_match": {"name": "..."} | None,        # InChIKey-canonical lookup in our 1046 compounds
    "descriptors": {                              # rdkit physicochemical
        "molecular_weight": ..., "logp": ..., "tpsa": ...,
        "num_h_bond_donors": ..., "num_h_bond_acceptors": ...,
        "num_rings": ..., "num_aromatic_rings": ..., "num_rotatable_bonds": ...,
    },
    "scaffold": "<canonical SMILES of Bemis-Murcko scaffold>",
    "nearest_neighbors": [
        {"name": "<drug name>", "similarity": 0.xx},  # top-5 Tanimoto
        ...
    ]
}

Returns drug name + structural context without leaking target / MoA / pathways. For known compounds, exact_match gives the name directly; for novel/perturbed SMILES, the nearest-neighbor list surfaces analog hints (e.g. an unknown adrenergic returns "dobutamine at 0.80 similarity").

hallmark_tools=True exposes ontology metadata only:

describe_hallmark(pathway: str, max_genes: int = 25) -> {
    "canonical_name": "HALLMARK_ANDROGEN_RESPONSE",
    "matched_from": "androgen signaling",
    "msigdb_url": "...",
    "gene_count": 101,
    "member_genes_sample": ["ABCC4", "ABHD2", "..."],
    "aliases": ["ANDROGEN_SIGNALING", "..."],
}

This deliberately does not return compound-specific pathway scores, observed directions, or phenotype labels. It is intended to reduce vocabulary/ontology errors without turning the pathway step into retrieval of the answer.

Local usage

Evals run via prime eval run against the Hub-published env (abugoot/bioreasoning_phenotype). Auth is handled by prime login — no PRIME_API_KEY env var needed. Both prime eval run and vf-eval execute the same code path; prime eval run adds auth, billing preflight, model-registry validation, and auto-uploads results to prime eval list.

uv pip install -e .

# diagnostic eval (default = all entry points × 4 phenotypes, random sample)
prime eval run abugoot/bioreasoning_phenotype@0.10.0 \
  -m openai/gpt-4.1-mini \
  -n 32 -r 1 -a '{}' --max-tokens 16384

# full-chain eval (smiles_only only)
prime eval run abugoot/bioreasoning_phenotype@0.10.0 \
  -m openai/gpt-4.1-mini \
  -n 100 -r 1 -a '{"entry_points":["smiles_only"]}' \
  --max-tokens 16384

# select a phenotype subset
prime eval run abugoot/bioreasoning_phenotype@0.10.0 \
  -m openai/gpt-4.1-mini \
  -n 100 -r 1 \
  -a '{"entry_points":["smiles_only"], "phenotypes":["viability","cell_cycle","stress"]}' \
  --max-tokens 16384

# with compound retrieval only
prime eval run abugoot/bioreasoning_phenotype@0.10.0 \
  -m openai/gpt-4.1-mini \
  -n 100 -r 1 \
  -a '{"entry_points":["smiles_only"], "tools": true, "max_tool_turns": 5}' \
  --max-tokens 16384

# with compound retrieval + Hallmark ontology tool
prime eval run abugoot/bioreasoning_phenotype@0.10.0 \
  -m openai/gpt-4.1-mini \
  -n 100 -r 1 \
  -a '{"entry_points":["smiles_only"], "tools": true, "hallmark_tools": true, "max_tool_turns": 5}' \
  --max-tokens 16384

Data prep (one-time, dev only)

Scripts in scripts/ build the bundled data/smallmol_chain_examples.parquet and data/compound_table.parquet:

  1. fetch_smallmol.py — download L1000 GCTX, PRISM CSV, Drug Repurposing Hub, Hallmark GMT
  2. build_compound_table.py — InChIKey-canonical compound × cell line table
  3. filter_l1000.py — extract MODZ signatures for our compound set
  4. compute_chain_gt.py — per-pair GT (target, MoA, signed pathways, cell_cycle / stress / magnitude buckets, viability LFC)
  5. build_examples.py — expand to all entry points × phenotypes, split train/test by compound hash
  6. smoke.py — local sanity checks

The data-prep extra (pip install -e ".[data-prep]") installs cmapPy, h5py, openpyxl, scipy. (rdkit is in the runtime deps because the identify_compound tool uses it.)