0

Long Chen

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

arXiv 2026

2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

arXiv 2026

2026

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

arXiv 2026

2026

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

arXiv 2026

2026

MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

arXiv 2026

2026

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

arXiv 2025

2025

DVGT: Driving Visual Geometry Transformer

arXiv 2025

2025

SimScale: Learning to Drive via Real-World Simulation at Scale

arXiv 2025

2025

Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation

arXiv 2025

2025

LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models

arXiv 2025

2025

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

arXiv 2025

2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report

arXiv 2025

2025

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

arXiv 2025

2025

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

CVPR 2025 1

2025

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

ICCV 2025

2025

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

arXiv 2024

2024

A Survey on Multimodal Benchmarks: In the Era of Large AI Models

arXiv 2024

2024

GenAD: Generative End-to-End Autonomous Driving

arXiv 2024

2024

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

arXiv 2024

2024

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

arXiv 2024

2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

arXiv 2024

2024

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

CVPR 2025 1

2024

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

arXiv 2024

2024

GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning

arXiv 2024

2024

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

arXiv 2024

2024

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

arXiv 2024

2024

LingoQA: Visual Question Answering for Autonomous Driving

arXiv 2023

2023

Compositional Feature Augmentation for Unbiased Scene Graph Generation

ICCV 2023 1

2023

SortedAP: Rethinking evaluation metrics for instance segmentation

arXiv 2023

2023

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

CVPR 2024 1

2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

arXiv 2023

2023

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

arXiv 2023

2023

Transformer Meets Boundary Value Inverse Problems

arXiv 2022

2022

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

crossformer-a-versatile-vision-transformer-1

2021

CenterNet3D: An Anchor Free Object Detector for Point Cloud

arXiv 2020

2020

MixNet: Multi-modality Mix Network for Brain Segmentation

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers