Yang Liu
- Papers
- 202
Cite
Notes
Only stored in your browser.
Authored papers
202AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol
arXiv 2025
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
arXiv 2026
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
arXiv 2026
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
arXiv 2026
FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
arXiv 2026
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
arXiv 2026
Autoregressive Image Generation with Masked Bit Modeling
arXiv 2026
SkillNet: Create, Evaluate, and Connect AI Skills
arXiv 2026
MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation
arXiv 2026
LongCat-Flash-Thinking-2601 Technical Report
arXiv 2026
Exploring Spatial Intelligence from a Generative Perspective
arXiv 2026
BabyVision: Visual Reasoning Beyond Language
arXiv 2026
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models
arXiv 2026
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
arXiv 2026
\$OneMillion-Bench: How Far are Language Agents from Human Experts?
arXiv 2026
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
arXiv 2026
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
arXiv 2026
STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media
arXiv 2026
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
arXiv 2026
Video-Based Reward Modeling for Computer-Use Agents
arXiv 2026
Matrix-3D: Omnidirectional Explorable 3D World Generation
arXiv 2025
Qwen3-VL Technical Report
arXiv 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
arXiv 2025
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
arXiv 2025
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
arXiv 2025
3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians
arXiv 2025
Skywork Open Reasoner 1 Technical Report
arXiv 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
arXiv 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
arXiv 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
arXiv 2025
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
arXiv 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
arXiv 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
arXiv 2025
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
arXiv 2025
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
arXiv 2025
AutoMV: An Automatic Multi-Agent System for Music Video Generation
arXiv 2025
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
arXiv 2025
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
arXiv 2025
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
arXiv 2025
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
arXiv 2025
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
arXiv 2025
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation
arXiv 2025
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
arXiv 2025
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
arXiv 2025
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
arXiv 2025
Qwen3Guard Technical Report
arXiv 2025
VGGT-X: When VGGT Meets Dense Novel View Synthesis
arXiv 2025
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
arXiv 2025
Uniform Discrete Diffusion with Metric Path for Video Generation
arXiv 2025
Emu3.5: Native Multimodal Models are World Learners
arXiv 2025
Matrix-Game: Interactive World Foundation Model
arXiv 2025
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
arXiv 2025
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
arXiv 2025
TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
arXiv 2025
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
arXiv 2025
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
arXiv 2025
AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery
arXiv 2025
DART: Differentiable Dynamic Adaptive Region Tokenizer for Vision Transformer and Mamba
arXiv 2025
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?
arXiv 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
ICCV 2025
Safety at Scale: A Comprehensive Survey of Large Model Safety
arXiv 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025 1
Visual Abstract Thinking Empowers Multimodal Reasoning
arXiv 2025
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation
panoramic-interests-stylistic-content-aware
DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms
arXiv 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
arXiv 2025
Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization
arXiv 2025
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents
arXiv 2025
SteeringControl: Holistic Evaluation of Alignment Steering in LLMs
arXiv 2025
Regretful Decisions under Label Noise
arXiv 2025
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
arXiv 2025
Skywork-R1V3 Technical Report
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
arXiv 2025
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
ICCV 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
arXiv 2025
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
arXiv 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
arXiv 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
ICCV 2025
Cross-modal Causal Relation Alignment for Video Question Grounding
cross-modal-causal-relation-alignment-for
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
arXiv 2025
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
arXiv 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
CVPR 2025 1
Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation
arXiv 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
arXiv 2025
Large Language Models for Cyber Security: A Systematic Literature Review
arXiv 2024
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
arXiv 2024
Discovering symbolic expressions with parallelized tree search
arXiv 2024
Datasets for Large Language Models: A Comprehensive Survey
arXiv 2024
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
arXiv 2024
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
arXiv 2024
FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization
arXiv 2024
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
arXiv 2024
AIGS: Generating Science from AI-Powered Automated Falsification
arXiv 2024
Vision-Language Models Can Self-Improve Reasoning via Reflection
arXiv 2024
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
arXiv 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
arXiv 2024
Xmodel-LM Technical Report
arXiv 2024
BadEdit: Backdooring large language models by model editing
arXiv 2024
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
arXiv 2024
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
arXiv 2024
Large Language Model Unlearning via Embedding-Corrupted Prompts
arXiv 2024
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
arXiv 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
arXiv 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
arXiv 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
arXiv 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
arXiv 2024
Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models
arXiv 2024
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework
arXiv 2024
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
arXiv 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
arXiv 2024
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
arXiv 2024
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
arXiv 2024
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
arXiv 2024
Model Composition for Multimodal Large Language Models
arXiv 2024
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
arXiv 2024
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
arXiv 2024
On the Role of Attention Heads in Large Language Model Safety
arXiv 2024
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
CVPR 2024 1
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
arXiv 2024
TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On
arXiv 2024
GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration
arXiv 2024
Reasoning-Enhanced Object-Centric Learning for Videos
arXiv 2024
Switch EMA: A Free Lunch for Better Flatness and Sharpness
arXiv 2024
WAS: Dataset and Methods for Artistic Text Segmentation
arXiv 2024
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers
arXiv 2024
Learning to Watermark LLM-generated Text via Reinforcement Learning
arXiv 2024
RobustTSF: Towards Theory and Design of Robust Time Series Forecasting with Anomalies
arXiv 2024
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
arXiv 2024
LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
arXiv 2024
Rigid Protein-Protein Docking via Equivariant Elliptic-Paraboloid Interface Prediction
arXiv 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
arXiv 2024
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
arXiv 2024
3D Vision and Language Pretraining with Large-Scale Synthetic Data
arXiv 2024
Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy
arXiv 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
arXiv 2024
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
arXiv 2023
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
ICCV 2023 1
Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
arXiv 2023
Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
arXiv 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CVPR 2024 1
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
arXiv 2023
3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
arXiv 2023
Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation
arXiv 2023
Few-shot Hybrid Domain Adaptation of Image Generators
arXiv 2023
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
arXiv 2023
Post-hoc Bias Scoring Is Optimal For Fair Classification
arXiv 2023
Procedural Fairness Through Decoupling Objectionable Data Generating Components
arXiv 2023
Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis
arXiv 2023
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
arXiv 2023
VFLAIR: A Research Library and Benchmark for Vertical Federated Learning
arXiv 2023
Multimodal Federated Learning via Contrastive Representation Ensemble
arXiv 2023
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
arXiv 2023
Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding
arXiv 2023
Crystal Structure Prediction by Joint Equivariant Diffusion
crystal-structure-prediction-by-joint
SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation
arXiv 2023
Small Models are Valuable Plug-ins for Large Language Models
arXiv 2023
End-to-End Full-Atom Antibody Design
arXiv 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
ICCV 2023 1
Model Sparsity Can Simplify Machine Unlearning
model-sparsity-can-simplify-machine
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
arXiv 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
CVPR 2024 1
PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
arXiv 2023
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception
ICCV 2023 1
Sparse Modular Activation for Efficient Sequence Modeling
sparse-modular-activation-for-efficient
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
ICCV 2023 1
ZeroFlow: Scalable Scene Flow via Distillation
arXiv 2023
IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks
arXiv 2023
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
arXiv 2023
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions
arXiv 2023
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
arXiv 2022
Unifying Vision, Text, and Layout for Universal Document Processing
CVPR 2023 1
Towards a Unified Multi-Dimensional Evaluator for Text Generation
arXiv 2022
Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes
arXiv 2022
ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark
arXiv 2022
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
arXiv 2022
SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation
arXiv 2022
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
arXiv 2022
UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization
arXiv 2022
AixBench: A Code Generation Benchmark Dataset
arXiv 2022
Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN
arXiv 2022
On Robust Prefix-Tuning for Text Classification
on-robust-prefix-tuning-for-text
Improving Bot Response Contradiction Detection via Utterance Rewriting
SIGDIAL (ACL) 2022 9
An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation
arXiv 2022
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
ICCV 2021 10
DialogSum: A Real-Life Scenario Dialogue Summarization Dataset
Findings (ACL) 2021 8
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator
Findings (ACL) 2022 5
Model Transferability With Responsive Decision Subjects
arXiv 2021
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
NAACL 2021 4
DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization
arXiv 2021
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
NAACL 2021 4
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
SIGDIAL (ACL) 2021 7
Cross Modal Retrieval with Querybank Normalisation
CVPR 2022 1
MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
ACL 2022 5
DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles
LREC 2022 6
Mask-Align: Self-Supervised Neural Word Alignment
ACL 2021 5
Noisy Self-Knowledge Distillation for Text Summarization
NAACL 2021 4
Fine-tune BERT for Extractive Summarization
fine-tune-bert-for-extractive-summarization-1
Text Summarization with Pretrained Encoders
text-summarization-with-pretrained-encoders-1
Generating Summaries with Topic Templates and Structured Convolutional Decoders
generating-summaries-with-topic-templates-and-1
Actionable Recourse in Linear Classification
arXiv 2018
Affiliations
Frequent co-authors
10from 202 papers