Natural language processing (NLP) has significantly influenced scientific domains beyond human language, including protein engineering, where pre-trained protein language models (PLMs) have demonstrated remarkable success. However, interdisciplinary adoption remains limited due to challenges in data collection, task benchmarking, and application. This work presents VenusFactory, a versatile engine that integrates biological data retrieval, standardized task benchmarking, and modular fine-tuning of PLMs. VenusFactory supports both computer science and biology communities with choices of both a command-line execution and a Gradio-based no-code interface, integrating $40+$ protein-related datasets and $40+$ popular PLMs. All implementations are open-sourced on https://github.com/tyang816/VenusFactory.
VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning
VenusFactory is a versatile application supporting the integration, benchmarking, and fine-tuning of protein language models with a range of datasets, available to both computer science and biology communities.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 11
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2503.15438ARXIV-DEFAULT
- TL;DR
- Semantic Scholar