0

Yong liu

Papers
54

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
54papers

Authored papers

54

PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

arXiv 2026

2026

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

arXiv 2026

2026

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

arXiv 2026

2026

Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

arXiv 2026

2026

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

arXiv 2026

2026

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

arXiv 2026

2026

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

arXiv 2026

2026

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

arXiv 2025

2025

Sundial: A Family of Highly Capable Time Series Foundation Models

arXiv 2025

2025

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

arXiv 2025

2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

arXiv 2025

2025

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

arXiv 2025

2025

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

arXiv 2025

2025

Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering

arXiv 2025

2025

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

arXiv 2025

2025

3D and 4D World Modeling: A Survey

arXiv 2025

2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

arXiv 2025

2025

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

arXiv 2025

2025

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

arXiv 2025

2025

Reinforcement Learning Foundations for Deep Research Systems: A Survey

arXiv 2025

2025

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

arXiv 2025

2025

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

arXiv 2025

2025

Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization

arXiv 2025

2025

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

arXiv 2024

2024

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

arXiv 2024

2024

ColorFlow: Retrieval-Augmented Image Sequence Colorization

arXiv 2024

2024

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

arXiv 2024

2024

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

arXiv 2024

2024

REEF: Representation Encoding Fingerprints for Large Language Models

arXiv 2024

2024

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

CVPR 2025 1

2024

EMOv2: Pushing 5M Vision Model Frontier

arXiv 2024

2024

UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling

arXiv 2024

2024

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

arXiv 2024

2024

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

arXiv 2024

2024

Timer: Generative Pre-trained Transformers Are Large Time Series Models

arXiv 2024

2024

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

arXiv 2024

2024

Tuning-Free Image Customization with Image and Text Guidance

arXiv 2024

2024

Adapting LLaMA Decoder to Vision Transformer

arXiv 2024

2024

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

arXiv 2024

2024

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

arXiv 2024

2024

Parameter-Efficient Conversational Recommender System as a Language Processing Task

arXiv 2024

2024

From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

arXiv 2024

2024

P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders

arXiv 2024

2024

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

CVPR 2024 1

2023

Sentence-level Prompts Benefit Composed Image Retrieval

arXiv 2023

2023

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

arXiv 2023

2023

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects

arXiv 2023

2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

ICCV 2023 1

2023

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

ICCV 2023 1

2023

RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction

ICCV 2023 1

2023

Can Large Language Models Empower Molecular Property Prediction?

arXiv 2023

2023

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

soc-semantic-assisted-object-cluster-for-1

2023

Bootstrap Latent Representations for Multi-modal Recommendation

arXiv 2022

2022

CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response Generation

Findings (ACL) 2021 8

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 54 papers