Xin Jin
- Papers
- 41
Cite
Notes
Only stored in your browser.
Authored papers
41D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
arXiv 2026
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
arXiv 2026
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
arXiv 2026
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
arXiv 2026
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning
arXiv 2026
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution
arXiv 2026
Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
arXiv 2026
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
arXiv 2025
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
CVPR 2025 1
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
CVPR 2025 1
ORV: 4D Occupancy-centric Robot Video Generation
arXiv 2025
Reasoning in Space via Grounding in the World
arXiv 2025
InteractComp: Evaluating Search Agents With Ambiguous Queries
arXiv 2025
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
ICCV 2025
Adaptive Data Exploitation in Deep Reinforcement Learning
arXiv 2025
Distribution Matching Distillation Meets Reinforcement Learning
arXiv 2025
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
arXiv 2025
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
arXiv 2025
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
dreamvla-a-vision-language-action-model
OmniNWM: Omniscient Driving Navigation World Models
arXiv 2025
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
arXiv 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
arXiv 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
arXiv 2025
Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection
arXiv 2025
UniScene: Unified Occupancy-centric Driving Scene Generation
CVPR 2025 1
Towards RAW Object Detection in Diverse Conditions
CVPR 2025 1
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
arXiv 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
arXiv 2024
SUMix: Mixup with Semantic and Uncertain Information
arXiv 2024
A Survey of Resource-efficient LLM and Multimodal Foundation Models
arXiv 2024
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis
arXiv 2024
A Survey on Mixup Augmentations and Beyond
arXiv 2024
DreamLIP: Language-Image Pre-training with Long Captions
arXiv 2024
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
arXiv 2024
Inpaint Anything: Segment Anything Meets Image Inpainting
arXiv 2023
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
arXiv 2023
Make Explicit Calibration Implicit: Calibrate Denoiser Instead of the Noise Model
ICCV 2023 1
Generalized Lightness Adaptation with Channel Selective Normalization
ICCV 2023 1
Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
CVPR 2023 1
NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation
ICCV 2023 1
Region Normalization for Image Inpainting
arXiv 2019
Affiliations
Frequent co-authors
10from 41 papers