Yifan Yang
- Papers
- 35
Cite
Notes
Only stored in your browser.
Authored papers
35SkillOpt: Executive Strategy for Self-Evolving Agent Skills
arXiv 2026
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
arXiv 2026
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
arXiv 2026
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
arXiv 2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
arXiv 2026
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
arXiv 2026
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
arXiv 2026
Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
arXiv 2026
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
arXiv 2026
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark
arXiv 2026
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
arXiv 2025
Region-Adaptive Sampling for Diffusion Transformers
arXiv 2025
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?
arXiv 2025
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
arXiv 2025
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
ICCV 2025
$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking
arXiv 2025
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
arXiv 2025
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
arXiv 2024
Demystifying Large Language Models for Medicine: A Primer
arXiv 2024
VecCity: A Taxonomy-guided Library for Map Entity Representation Learning
arXiv 2024
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
arXiv 2024
REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation
arXiv 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
arXiv 2024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
arXiv 2024
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
arXiv 2024
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning
arXiv 2024
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
arXiv 2024
Delay-penalized CTC implemented based on Finite State Transducer
arXiv 2023
Matching Patients to Clinical Trials with Large Language Models
arXiv 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
arXiv 2023
Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections
ICCV 2023 1
GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information
arXiv 2023
Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score
arXiv 2023
Attentive Mask CLIP
ICCV 2023 1
Affiliations
Frequent co-authors
10from 35 papers