Yong liu
- Papers
- 54
Cite
Notes
Only stored in your browser.
Authored papers
54PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset
arXiv 2026
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
arXiv 2026
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
arXiv 2026
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
arXiv 2026
The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
arXiv 2026
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
arXiv 2026
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain
arXiv 2026
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
arXiv 2025
Sundial: A Family of Highly Capable Time Series Foundation Models
arXiv 2025
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
arXiv 2025
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
arXiv 2025
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
arXiv 2025
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
arXiv 2025
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering
arXiv 2025
SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
arXiv 2025
3D and 4D World Modeling: A Survey
arXiv 2025
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
arXiv 2025
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
arXiv 2025
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation
arXiv 2025
Reinforcement Learning Foundations for Deep Research Systems: A Survey
arXiv 2025
O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
arXiv 2025
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
arXiv 2025
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization
arXiv 2025
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024
MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
arXiv 2024
ColorFlow: Retrieval-Augmented Image Sequence Colorization
arXiv 2024
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection
arXiv 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
arXiv 2024
REEF: Representation Encoding Fingerprints for Large Language Models
arXiv 2024
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
CVPR 2025 1
EMOv2: Pushing 5M Vision Model Frontier
arXiv 2024
UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling
arXiv 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
arXiv 2024
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
arXiv 2024
Timer: Generative Pre-trained Transformers Are Large Time Series Models
arXiv 2024
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables
arXiv 2024
Tuning-Free Image Customization with Image and Text Guidance
arXiv 2024
Adapting LLaMA Decoder to Vision Transformer
arXiv 2024
TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On
arXiv 2024
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
arXiv 2024
Parameter-Efficient Conversational Recommender System as a Language Processing Task
arXiv 2024
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
arXiv 2024
P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders
arXiv 2024
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
CVPR 2024 1
Sentence-level Prompts Benefit Composed Image Retrieval
arXiv 2023
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
arXiv 2023
Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects
arXiv 2023
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
ICCV 2023 1
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
ICCV 2023 1
RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction
ICCV 2023 1
Can Large Language Models Empower Molecular Property Prediction?
arXiv 2023
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
soc-semantic-assisted-object-cluster-for-1
Bootstrap Latent Representations for Multi-modal Recommendation
arXiv 2022
CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response Generation
Findings (ACL) 2021 8
Affiliations
Frequent co-authors
10from 54 papers