Yi Yang
- Papers
- 83
Cite
Notes
Only stored in your browser.
Authored papers
83CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage
arXiv 2026
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
arXiv 2026
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
arXiv 2026
AcademiClaw: When Students Set Challenges for AI Agents
arXiv 2026
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
arXiv 2026
EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge
arXiv 2026
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
arXiv 2026
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
arXiv 2026
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
ICCV 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
arXiv 2025
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
ICCV 2025
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
arXiv 2025
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
arXiv 2025
HiMo: High-Speed Objects Motion Compensation in Point Clouds
arXiv 2025
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
arXiv 2025
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
arXiv 2025
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective
arXiv 2025
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
arXiv 2025
FinMTEB: Finance Massive Text Embedding Benchmark
arXiv 2025
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
arXiv 2025
C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset
arXiv 2025
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
arXiv 2025
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering
arXiv 2025
Advances in 4D Generation: A Survey
arXiv 2025
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents
arXiv 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
arXiv 2025
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
arXiv 2025
Scaling 4D Representations
arXiv 2024
FlexDiT: Dynamic Token Density Control for Diffusion Transformer
arXiv 2024
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
arXiv 2024
Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?
arXiv 2024
Nonverbal Interaction Detection
arXiv 2024
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection
arXiv 2024
TDDBench: A Benchmark for Training data detection
arXiv 2024
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
arXiv 2024
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
arXiv 2024
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
arXiv 2024
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
arXiv 2024
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
arXiv 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
arXiv 2024
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
arXiv 2024
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
arXiv 2024
MS-DETR: Efficient DETR Training with Mixed Supervision
CVPR 2024 1
Replication in Visual Diffusion Models: A Survey and Outlook
arXiv 2024
AnyPattern: Towards In-context Image Copy Detection
arXiv 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
arXiv 2024
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
arXiv 2024
An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification
arXiv 2024
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
ICCV 2023 1
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
arXiv 2023
InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning
arXiv 2023
Segment and Track Anything
arXiv 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
arXiv 2023
Clustering based Point Cloud Representation Learning for 3D Analysis
ICCV 2023 1
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
ICCV 2023 1
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
CVPR 2024 1
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
ICCV 2023 1
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
CVPR 2024 1
Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion
arXiv 2023
Bird's-Eye-View Scene Graph for Vision-Language Navigation
ICCV 2023 1
Human101: Training 100+FPS Human Gaussians in 100s from 1 View
arXiv 2023
Fast and Accurate Factual Inconsistency Detection Over Long Documents
arXiv 2023
Feature-compatible Progressive Learning for Video Copy Detection
arXiv 2023
TransHP: Image Classification with Hierarchical Prompting
transhp-image-classification-with
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
ICCV 2023 1
Whitening-based Contrastive Learning of Sentence Embeddings
arXiv 2023
Compositional Feature Augmentation for Unbiased Scene Graph Generation
ICCV 2023 1
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
arXiv 2023
Video Object Segmentation in Panoptic Wild Scenes
arXiv 2023
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
arXiv 2022
V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval
arXiv 2022
Tele-Knowledge Pre-training for Fault Analysis
arXiv 2022
A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection
arXiv 2022
CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes
ACL 2021 5
D$^2$LV: A Data-Driven and Local-Verification Approach for Image Copy Detection
arXiv 2021
Bag of Tricks and A Strong baseline for Image Copy Detection
arXiv 2021
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning
EMNLP 2020 11
NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search
arXiv 2020
FinBERT: A Pretrained Language Model for Financial Communications
arXiv 2020
Network Pruning via Transformable Architecture Search
network-pruning-via-transformable-1
Random Erasing Data Augmentation
arXiv 2017
Affiliations
Frequent co-authors
10from 83 papers