Bo wang

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

arXiv 2026

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

arXiv 2026

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

arXiv 2026

MedRAX: Medical Reasoning Agent for Chest X-ray

arXiv 2025

MedSAM2: Segment Anything in 3D Medical Images and Videos

arXiv 2025

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

arXiv 2025

A Survey on Efficient Vision-Language-Action Models

arXiv 2025

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

arXiv 2025

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

arXiv 2025

ASM-UNet: Adaptive Scan Mamba Integrating Group Commonalities and Individual Variations for Fine-Grained Segmentation

arXiv 2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

arXiv 2025

LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking

arXiv 2025

Multi-hop Reasoning via Early Knowledge Alignment

arXiv 2025

Segment Anything in Medical Images and Videos: Benchmark and Deployment

arXiv 2024

ECG-FM: An Open Electrocardiogram Foundation Model

arXiv 2024

MassSpecGym: A benchmark for the discovery and identification of molecules

arXiv 2024

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

arXiv 2024

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

arXiv 2024

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

arXiv 2024

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

arXiv 2024

SceneTracker: Long-term Scene Flow Estimation Network

arXiv 2024

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

arXiv 2024

BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments

arXiv 2024

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

arXiv 2024

Rethinking Multi-view Representation Learning via Distilled Disentangling

CVPR 2024 1