0

Si Liu

Papers
30

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
30papers

Authored papers

30

ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models

arXiv 2026

2026

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

CVPR 2025 1

2025

Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency

arXiv 2025

2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

arXiv 2025

2025

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

arXiv 2025

2025

PICABench: How Far Are We from Physically Realistic Image Editing?

arXiv 2025

2025

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

arXiv 2025

2025

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

arXiv 2025

2025

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

arXiv 2025

2025

EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling

arXiv 2025

2025

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

arXiv 2024

2024

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

CVPR 2025 1

2024

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

arXiv 2024

2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

arXiv 2024

2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

arXiv 2024

2024

Image Understanding Makes for A Good Tokenizer for Image Generation

arXiv 2024

2024

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

arXiv 2024

2024

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More

arXiv 2024

2024

Communication-Efficient Collaborative Perception via Information Filling with Codebook

CVPR 2024 1

2024

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

arXiv 2024

2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

arXiv 2023

2023

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

CVPR 2023 1

2023

Object as Query: Lifting any 2D Object Detector to 3D Detection

ICCV 2023 1

2023

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

CVPR 2023 1

2023

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

arXiv 2023

2023

Video Background Music Generation: Dataset, Method and Evaluation

ICCV 2023 1

2022

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

CVPR 2022 1

2022

General Instance Distillation for Object Detection

CVPR 2021 1

2021

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

arXiv 2020

2020

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

psgan-pose-and-expression-robust-spatial

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 30 papers