0

Yi Yang

Papers
83

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
83papers

Authored papers

83

CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage

arXiv 2026

2026

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

arXiv 2026

2026

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

arXiv 2026

2026

AcademiClaw: When Students Set Challenges for AI Agents

arXiv 2026

2026

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

arXiv 2026

2026

EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge

arXiv 2026

2026

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

arXiv 2026

2026

Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

arXiv 2026

2026

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

ICCV 2025

2025

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

arXiv 2025

2025

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

ICCV 2025

2025

FlexSelect: Flexible Token Selection for Efficient Long Video Understanding

arXiv 2025

2025

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

arXiv 2025

2025

HiMo: High-Speed Objects Motion Compensation in Point Clouds

arXiv 2025

2025

Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

arXiv 2025

2025

ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation

arXiv 2025

2025

Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

arXiv 2025

2025

GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings

arXiv 2025

2025

FinMTEB: Finance Massive Text Embedding Benchmark

arXiv 2025

2025

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

arXiv 2025

2025

C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset

arXiv 2025

2025

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

arXiv 2025

2025

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

arXiv 2025

2025

Advances in 4D Generation: A Survey

arXiv 2025

2025

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

arXiv 2025

2025

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

arXiv 2025

2025

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

arXiv 2025

2025

Scaling 4D Representations

arXiv 2024

2024

FlexDiT: Dynamic Token Density Control for Diffusion Transformer

arXiv 2024

2024

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

arXiv 2024

2024

Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?

arXiv 2024

2024

Nonverbal Interaction Detection

arXiv 2024

2024

MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection

arXiv 2024

2024

TDDBench: A Benchmark for Training data detection

arXiv 2024

2024

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning

arXiv 2024

2024

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

arXiv 2024

2024

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

arXiv 2024

2024

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

arXiv 2024

2024

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

arXiv 2024

2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

arXiv 2024

2024

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

arXiv 2024

2024

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

arXiv 2024

2024

MS-DETR: Efficient DETR Training with Mixed Supervision

CVPR 2024 1

2024

Replication in Visual Diffusion Models: A Survey and Outlook

arXiv 2024

2024

AnyPattern: Towards In-context Image Copy Detection

arXiv 2024

2024

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs

arXiv 2024

2024

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment

arXiv 2024

2024

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

arXiv 2024

2024

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

ICCV 2023 1

2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

arXiv 2023

2023

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

arXiv 2023

2023

Segment and Track Anything

arXiv 2023

2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

arXiv 2023

2023

Clustering based Point Cloud Representation Learning for 3D Analysis

ICCV 2023 1

2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

ICCV 2023 1

2023

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

CVPR 2024 1

2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

ICCV 2023 1

2023

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024 1

2023

Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion

arXiv 2023

2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

ICCV 2023 1

2023

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

arXiv 2023

2023

Fast and Accurate Factual Inconsistency Detection Over Long Documents

arXiv 2023

2023

Feature-compatible Progressive Learning for Video Copy Detection

arXiv 2023

2023

TransHP: Image Classification with Hierarchical Prompting

transhp-image-classification-with

2023

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

ICCV 2023 1

2023

Whitening-based Contrastive Learning of Sentence Embeddings

arXiv 2023

2023

Compositional Feature Augmentation for Unbiased Scene Graph Generation

ICCV 2023 1

2023

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

arXiv 2023

2023

Video Object Segmentation in Panoptic Wild Scenes

arXiv 2023

2023

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

arXiv 2022

2022

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

arXiv 2022

2022

Tele-Knowledge Pre-training for Fault Analysis

arXiv 2022

2022

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

arXiv 2022

2022

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

ACL 2021 5

2021

D$^2$LV: A Data-Driven and Local-Verification Approach for Image Copy Detection

arXiv 2021

2021

Bag of Tricks and A Strong baseline for Image Copy Detection

arXiv 2021

2021

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

EMNLP 2020 11

2020

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

arXiv 2020

2020

FinBERT: A Pretrained Language Model for Financial Communications

arXiv 2020

2020

Network Pruning via Transformable Architecture Search

network-pruning-via-transformable-1

2019

Random Erasing Data Augmentation

arXiv 2017

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 83 papers