0

Yue Liu

Papers
34

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
34papers

Authored papers

34

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

arXiv 2026

2026

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

arXiv 2026

2026

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

arXiv 2026

2026

Balancing Understanding and Generation in Discrete Diffusion Models

arXiv 2026

2026

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

arXiv 2025

2025

Safety in Large Reasoning Models: A Survey

arXiv 2025

2025

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

arXiv 2025

2025

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

arXiv 2025

2025

FlowReasoner: Reinforcing Query-Level Meta-Agents

arXiv 2025

2025

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

arXiv 2025

2025

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

arXiv 2025

2025

Geometric-Mean Policy Optimization

arXiv 2025

2025

Efficient Inference for Large Reasoning Models: A Survey

arXiv 2025

2025

GuardReasoner: Towards Reasoning-based LLM Safeguards

arXiv 2025

2025

Pixels, Patterns, but No Poetry: To See The World like Humans

arXiv 2025

2025

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

arXiv 2025

2025

Towards Cross-View Point Correspondence in Vision-Language Models

arXiv 2025

2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv 2025

2025

VMamba: Visual State Space Model

arXiv 2024

2024

vHeat: Building Vision Models upon Heat Conduction

arXiv 2024

2024

CRAG -- Comprehensive RAG Benchmark

arXiv 2024

2024

FlipAttack: Jailbreak LLMs via Flipping

arXiv 2024

2024

DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

arXiv 2024

2024

Detecting Conversational Mental Manipulation with Intent-Aware Prompting

arXiv 2024

2024

Deep Temporal Graph Clustering

arXiv 2023

2023

Large Language Models for Software Engineering: A Systematic Literature Review

arXiv 2023

2023

Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

arXiv 2023

2023

Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

arXiv 2023

2023

At Which Training Stage Does Code Data Help LLMs Reasoning?

arXiv 2023

2023

A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal

arXiv 2022

2022

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

arXiv 2021

2021

RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 34 papers