Jiaya Jia
- Papers
- 44
Cite
Notes
Only stored in your browser.
Authored papers
44Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
arXiv 2026
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
arXiv 2025
Training-Free Efficient Video Generation via Dynamic Token Carving
arXiv 2025
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
arXiv 2025
ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay
arXiv 2025
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
arXiv 2025
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
arXiv 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
arXiv 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
arXiv 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
ICCV 2025
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
arXiv 2025
Logits-Based Finetuning
arXiv 2025
STEVE: A Step Verification Pipeline for Computer-use Agent Training
arXiv 2025
DreamOmni3: Scribble-based Editing and Generation
arXiv 2025
VisionZip: Longer is Better but Not Necessary in Vision Language Models
CVPR 2025 1
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
arXiv 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
ICCV 2025
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
arXiv 2024
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
arXiv 2024
Scalable Language Model with Generalized Continual Learning
arXiv 2024
Unified Language-driven Zero-shot Domain Adaptation
CVPR 2024 1
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
arXiv 2024
LISA: Reasoning Segmentation via Large Language Model
CVPR 2024 1
Spherical Transformer for LiDAR-based 3D Recognition
CVPR 2023 1
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
arXiv 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
arXiv 2023
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
voxelnext-fully-sparse-voxelnet-for-3d-object
LLMGA: Multimodal Large Language Model based Generation Assistant
arXiv 2023
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
arXiv 2023
MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks
arXiv 2023
Mask-Attention-Free Transformer for 3D Instance Segmentation
ICCV 2023 1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
arXiv 2023
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need
CVPR 2023 1
MAT: Mask-Aware Transformer for Large Hole Image Inpainting
CVPR 2022 1
High-Quality Entity Segmentation
arXiv 2022
Focal Sparse Convolutional Networks for 3D Object Detection
CVPR 2022 1
Image Inpainting via Iteratively Decoupled Probabilistic Modeling
arXiv 2022
Jigsaw Clustering for Unsupervised Visual Representation Learning
CVPR 2021 1
GridMask Data Augmentation
arXiv 2020
VCNet: A Robust Approach to Blind Image Inpainting
ECCV 2020 8
Image Inpainting via Generative Multi-column Convolutional Neural Networks
image-inpainting-via-generative-multi-column-1
Path Aggregation Network for Instance Segmentation
path-aggregation-network-for-instance-1
Affiliations
Frequent co-authors
10from 44 papers