Bo Li
- Papers
- 100
Cite
Notes
Only stored in your browser.
Authored papers
100ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
arXiv 2026
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
arXiv 2026
Internal Safety Collapse in Frontier Large Language Models
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
arXiv 2026
Mixture of Style Experts for Diverse Image Stylization
arXiv 2026
A Very Big Video Reasoning Suite
arXiv 2026
RewardHarness: Self-Evolving Agentic Post-Training
arXiv 2026
Demystifying Video Reasoning
arXiv 2026
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
arXiv 2026
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
arXiv 2026
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
arXiv 2026
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
arXiv 2026
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
arXiv 2026
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
arXiv 2026
EgoLife: Towards Egocentric Life Assistant
CVPR 2025 1
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
arXiv 2025
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
arXiv 2025
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
arXiv 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
arXiv 2025
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions
arXiv 2025
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
arXiv 2025
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
arXiv 2025
Scaling Spatial Intelligence with Multimodal Foundation Models
arXiv 2025
Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making
arXiv 2025
Visual Generation Tuning
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
arXiv 2025
MMSearch-R1: Incentivizing LMMs to Search
arXiv 2025
Visual Jigsaw Post-Training Improves MLLMs
arXiv 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
arXiv 2025
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
arXiv 2025
Adaptive Data Exploitation in Deep Reinforcement Learning
arXiv 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
arXiv 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
arXiv 2025
Safety at Scale: A Comprehensive Survey of Large Model Safety
arXiv 2025
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
arXiv 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
arXiv 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
arXiv 2025
IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector
arXiv 2025
Reliable and Efficient Amortized Model-based Evaluation
arXiv 2025
LLaVA-OneVision: Easy Visual Task Transfer
arXiv 2024
FullStack Bench: Evaluating LLMs as Full Stack Coders
arXiv 2024
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement
arXiv 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
arXiv 2024
Differentially Private Synthetic Data via Foundation Model APIs 2: Text
arXiv 2024
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
arXiv 2024
Introducing v0.5 of the AI Safety Benchmark from MLCommons
arXiv 2024
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
arXiv 2024
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
high-precision-dichotomous-image-segmentation-1
Long Context Transfer from Language to Vision
arXiv 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
arXiv 2024
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
arXiv 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
arXiv 2024
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
arXiv 2024
Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation
arXiv 2024
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
arXiv 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
arXiv 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
arXiv 2024
COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits
arXiv 2024
ASAM: Boosting Segment Anything Model with Adversarial Tuning
CVPR 2024 1
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
arXiv 2024
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
arXiv 2024
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
arXiv 2024
Towards Natural Image Matting in the Wild via Real-Scenario Prior
arXiv 2024
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
arXiv 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
arXiv 2024
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
arXiv 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
arXiv 2024
LIME: Less Is More for MLLM Evaluation
arXiv 2024
Tree-Regularized Tabular Embeddings
arXiv 2024
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
arXiv 2024
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
arXiv 2024
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
arXiv 2023
On the Tool Manipulation Capability of Open-source Large Language Models
arXiv 2023
Panoptic Video Scene Graph Generation
panoptic-video-scene-graph-generation
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
arXiv 2023
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
growsp-unsupervised-semantic-segmentation-of
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
arXiv 2023
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
CVPR 2024 1
OtterHD: A High-Resolution Multi-modality Model
arXiv 2023
FunQA: Towards Surprising Video Comprehension
arXiv 2023
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
CVPR 2023 1
Reconstructive Neuron Pruning for Backdoor Defense
arXiv 2023
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
arXiv 2023
Competing for Shareable Arms in Multi-Player Multi-Armed Bandits
arXiv 2023
AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies
arXiv 2022
Sparse Mixture-of-Experts are Domain Generalizable Learners
arXiv 2022
Can Brain Signals Reveal Inner Alignment with Human Languages?
arXiv 2022
Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability
arXiv 2020
Adversarial Mutual Information for Text Generation
ICML 2020 1
Affiliations
Frequent co-authors
10from 100 papers