Jian Yang
- Papers
- 106
Cite
Notes
Only stored in your browser.
Authored papers
106WaDi: Weight Direction-aware Distillation for One-step Image Synthesis
arXiv 2026
RefAlign: Representation Alignment for Reference-to-Video Generation
arXiv 2026
MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing
arXiv 2026
IQuest-Coder-V1 Technical Report
arXiv 2026
Muses: Designing, Composing, Generating Nonexistent Fantasy 3D Creatures without Training
arXiv 2026
LongCat-Flash-Thinking-2601 Technical Report
arXiv 2026
InCoder-32B: Code Foundation Model for Industrial Scenarios
arXiv 2026
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
arXiv 2026
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
arXiv 2026
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
arXiv 2026
DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
arXiv 2026
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
arXiv 2026
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
arXiv 2026
L2P: Unlocking Latent Potential for Pixel Generation
arXiv 2026
Qwen3 Technical Report
preprint
YuE: Scaling Open Foundation Models for Long-Form Music Generation
arXiv 2025
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
ICCV 2025
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats
arXiv 2025
TaskCraft: Automated Generation of Agentic Tasks
arXiv 2025
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection
arXiv 2025
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
arXiv 2025
A Comprehensive Survey on Long Context Language Modeling
arXiv 2025
InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
arXiv 2025
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
arXiv 2025
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
arXiv 2025
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
arXiv 2025
StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
arXiv 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
CVPR 2025 1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
arXiv 2025
A Survey on Latent Reasoning
arXiv 2025
OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding
arXiv 2025
DiP: Taming Diffusion Models in Pixel Space
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
Efficient Agents: Building Effective Agents While Reducing Cost
arXiv 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
arXiv 2025
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
arXiv 2025
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
arXiv 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
arXiv 2025
RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation
arXiv 2025
A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
arXiv 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
arXiv 2025
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
arXiv 2025
Gaussian Splatting with Discretized SDF for Relightable Assets
ICCV 2025
Multilingual Multimodal Software Developer for Code Generation
arXiv 2025
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
arXiv 2025
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
arXiv 2025
Subject-Consistent and Pose-Diverse Text-to-Image Generation
arXiv 2025
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
arXiv 2025
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
arXiv 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv 2025
USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models
arXiv 2025
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
arXiv 2025
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents
arXiv 2025
Qwen2.5 Technical Report
arXiv 2024
Qwen2 Technical Report
arXiv 2024
Evaluating and Aligning CodeLLMs on Human Preference
arXiv 2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
arXiv 2024
ATPrompt: Textual Prompt Learning with Embedded Attributes
ICCV 2025
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
arXiv 2024
OmniBench: Towards The Future of Universal Omni-Language Models
arXiv 2024
HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes
arXiv 2024
SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor
arXiv 2024
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
arXiv 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
arXiv 2024
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
CVPR 2024 1
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
arXiv 2024
Towards a Unified View of Preference Learning for Large Language Models: A Survey
arXiv 2024
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
arXiv 2024
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
arXiv 2024
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
arXiv 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
arXiv 2024
Cascade Prompt Learning for Vision-Language Model Adaptation
arXiv 2024
Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction
arXiv 2024
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
ICCV 2025
Barbie: Text to Barbie-Style 3D Avatars
arXiv 2024
Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
arXiv 2024
FuzzCoder: Byte-level Fuzzing Test via Large Language Model
arXiv 2024
McEval: Massively Multilingual Code Evaluation
arXiv 2024
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence
arXiv 2024
RotationDrag: Point-based Image Editing with Rotated Diffusion Features
arXiv 2024
RNG: Relightable Neural Gaussians
CVPR 2025 1
Customized Generation Reimagined: Fidelity and Editability Harmonized
arXiv 2024
LIME: Less Is More for MLLM Evaluation
arXiv 2024
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies
arXiv 2024
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025 1
Qwen Technical Report
arXiv 2023
Robust Outlier Rejection for 3D Registration with Variational Bayes
CVPR 2023 1
OWL: A Large Language Model for IT Operations
arXiv 2023
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
arXiv 2023
Large Selective Kernel Network for Remote Sensing Object Detection
ICCV 2023 1
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
arXiv 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
arXiv 2023
Fine-Grained Visual Prompting
NeurIPS 2023 11
FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models
arXiv 2023
Creative Birds: Self-Supervised Single-View 3D Style Transfer
ICCV 2023 1
MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information Extraction
arXiv 2023
Enhancing Large Language Model with Self-Controlled Memory Framework
arXiv 2023
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
arXiv 2022
HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation
arXiv 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
arXiv 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
arXiv 2022
SEMICON: A Learning-to-hash Solution for Large-scale Fine-grained Image Retrieval
arXiv 2022
UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation
arXiv 2022
Contrastive Embedding for Generalized Zero-Shot Learning
CVPR 2021 1
Selective Kernel Networks
selective-kernel-networks-1
2017 Robotic Instrument Segmentation Challenge
arXiv 2019
Affiliations
Frequent co-authors
10from 106 papers