0

Yifan Yang

Papers
35

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
35papers

Authored papers

35

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

arXiv 2026

2026

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

arXiv 2026

2026

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

arXiv 2026

2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

arXiv 2026

2026

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

arXiv 2026

2026

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

arXiv 2026

2026

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

arXiv 2026

2026

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

arXiv 2026

2026

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

arXiv 2026

2026

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

arXiv 2026

2026

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

arXiv 2025

2025

Region-Adaptive Sampling for Diffusion Transformers

arXiv 2025

2025

MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?

arXiv 2025

2025

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

arXiv 2025

2025

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

ICCV 2025

2025

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

arXiv 2025

2025

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

arXiv 2025

2025

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

arXiv 2024

2024

Demystifying Large Language Models for Medicine: A Primer

arXiv 2024

2024

VecCity: A Taxonomy-guided Library for Map Entity Representation Learning

arXiv 2024

2024

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

arXiv 2024

2024

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

ICCV 2025

2024

CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation

arXiv 2024

2024

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

arXiv 2024

2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

arXiv 2024

2024

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

arXiv 2024

2024

AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning

arXiv 2024

2024

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

arXiv 2024

2024

Delay-penalized CTC implemented based on Finite State Transducer

arXiv 2023

2023

Matching Patients to Clinical Trials with Large Language Models

arXiv 2023

2023

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

arXiv 2023

2023

Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections

ICCV 2023 1

2023

GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information

arXiv 2023

2023

Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score

arXiv 2023

2023

Attentive Mask CLIP

ICCV 2023 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 35 papers