Yue Liu
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
arXiv 2026
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents
arXiv 2026
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model
arXiv 2026
Balancing Understanding and Generation in Discrete Diffusion Models
arXiv 2026
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
arXiv 2025
Safety in Large Reasoning Models: A Survey
arXiv 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
arXiv 2025
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
arXiv 2025
FlowReasoner: Reinforcing Query-Level Meta-Agents
arXiv 2025
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
arXiv 2025
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
arXiv 2025
Geometric-Mean Policy Optimization
arXiv 2025
Efficient Inference for Large Reasoning Models: A Survey
arXiv 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
arXiv 2025
Pixels, Patterns, but No Poetry: To See The World like Humans
arXiv 2025
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
arXiv 2025
Towards Cross-View Point Correspondence in Vision-Language Models
arXiv 2025
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
VMamba: Visual State Space Model
arXiv 2024
vHeat: Building Vision Models upon Heat Conduction
arXiv 2024
CRAG -- Comprehensive RAG Benchmark
arXiv 2024
FlipAttack: Jailbreak LLMs via Flipping
arXiv 2024
DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
arXiv 2024
Detecting Conversational Mental Manipulation with Intent-Aware Prompting
arXiv 2024
Deep Temporal Graph Clustering
arXiv 2023
Large Language Models for Software Engineering: A Systematic Literature Review
arXiv 2023
Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey
arXiv 2023
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
arXiv 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
arXiv 2023
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal
arXiv 2022
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer
arXiv 2021
RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System
arXiv 2020
Affiliations
Frequent co-authors
10from 34 papers