Long Chen
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
arXiv 2026
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
arXiv 2026
Coarse-Guided Visual Generation via Weighted h-Transform Sampling
arXiv 2026
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
arXiv 2026
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
arXiv 2025
DVGT: Driving Visual Geometry Transformer
arXiv 2025
SimScale: Learning to Drive via Real-World Simulation at Scale
arXiv 2025
Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
arXiv 2025
LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models
arXiv 2025
FAS: Fast ANN-SNN Conversion for Spiking Large Language Models
arXiv 2025
MiMo-Embodied: X-Embodied Foundation Model Technical Report
arXiv 2025
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
arXiv 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
CVPR 2025 1
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
ICCV 2025
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
arXiv 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
arXiv 2024
GenAD: Generative End-to-End Autonomous Driving
arXiv 2024
Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data
arXiv 2024
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
arXiv 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
arXiv 2024
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
CVPR 2025 1
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
arXiv 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
arXiv 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
arXiv 2024
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
arXiv 2024
LingoQA: Visual Question Answering for Autonomous Driving
arXiv 2023
Compositional Feature Augmentation for Unbiased Scene Graph Generation
ICCV 2023 1
SortedAP: Rethinking evaluation metrics for instance segmentation
arXiv 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
CVPR 2024 1
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
arXiv 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
arXiv 2023
Transformer Meets Boundary Value Inverse Problems
arXiv 2022
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
crossformer-a-versatile-vision-transformer-1
CenterNet3D: An Anchor Free Object Detector for Point Cloud
arXiv 2020
MixNet: Multi-modality Mix Network for Brain Segmentation
arXiv 2020
Affiliations
Frequent co-authors
10from 36 papers