0

Jiaya Jia

Papers
44

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
44papers

Authored papers

44

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

arXiv 2026

2026

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

arXiv 2026

2026

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

arXiv 2025

2025

Training-Free Efficient Video Generation via Dynamic Token Carving

arXiv 2025

2025

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

arXiv 2025

2025

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

arXiv 2025

2025

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

arXiv 2025

2025

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

arXiv 2025

2025

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

arXiv 2025

2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

arXiv 2025

2025

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

arXiv 2025

2025

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

ICCV 2025

2025

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

arXiv 2025

2025

Logits-Based Finetuning

arXiv 2025

2025

STEVE: A Step Verification Pipeline for Computer-use Agent Training

arXiv 2025

2025

DreamOmni3: Scribble-based Editing and Generation

arXiv 2025

2025

VisionZip: Longer is Better but Not Necessary in Vision Language Models

CVPR 2025 1

2024

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

arXiv 2024

2024

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

ICCV 2025

2024

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

arXiv 2024

2024

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

arXiv 2024

2024

Scalable Language Model with Generalized Continual Learning

arXiv 2024

2024

Unified Language-driven Zero-shot Domain Adaptation

CVPR 2024 1

2024

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

arXiv 2024

2024

LISA: Reasoning Segmentation via Large Language Model

CVPR 2024 1

2023

Spherical Transformer for LiDAR-based 3D Recognition

CVPR 2023 1

2023

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

arXiv 2023

2023

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

arXiv 2023

2023

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

voxelnext-fully-sparse-voxelnet-for-3d-object

2023

LLMGA: Multimodal Large Language Model based Generation Assistant

arXiv 2023

2023

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

arXiv 2023

2023

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

arXiv 2023

2023

Mask-Attention-Free Transformer for 3D Instance Segmentation

ICCV 2023 1

2023

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

arXiv 2023

2023

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

CVPR 2023 1

2023

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

CVPR 2022 1

2022

High-Quality Entity Segmentation

arXiv 2022

2022

Focal Sparse Convolutional Networks for 3D Object Detection

CVPR 2022 1

2022

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

arXiv 2022

2022

Jigsaw Clustering for Unsupervised Visual Representation Learning

CVPR 2021 1

2021

GridMask Data Augmentation

arXiv 2020

2020

VCNet: A Robust Approach to Blind Image Inpainting

ECCV 2020 8

2020

Image Inpainting via Generative Multi-column Convolutional Neural Networks

image-inpainting-via-generative-multi-column-1

2018

Path Aggregation Network for Instance Segmentation

path-aggregation-network-for-instance-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 44 papers