Si Liu
- Papers
- 30
Cite
Notes
Only stored in your browser.
Authored papers
30ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
arXiv 2026
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
CVPR 2025 1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
arXiv 2025
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
arXiv 2025
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
arXiv 2025
PICABench: How Far Are We from Physically Realistic Image Editing?
arXiv 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
arXiv 2025
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
arXiv 2025
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
arXiv 2025
EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling
arXiv 2025
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
arXiv 2024
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
CVPR 2025 1
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
arXiv 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
arXiv 2024
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
arXiv 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
arXiv 2024
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
arXiv 2024
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
arXiv 2024
Communication-Efficient Collaborative Perception via Information Filling with Codebook
CVPR 2024 1
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
arXiv 2024
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
arXiv 2023
Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection
CVPR 2023 1
Object as Query: Lifting any 2D Object Detector to 3D Detection
ICCV 2023 1
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
CVPR 2023 1
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
arXiv 2023
Video Background Music Generation: Dataset, Method and Evaluation
ICCV 2023 1
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
CVPR 2022 1
General Instance Distillation for Object Detection
CVPR 2021 1
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
arXiv 2020
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
psgan-pose-and-expression-robust-spatial
Affiliations
Frequent co-authors
10from 30 papers