Yu Zhang
- Papers
- 60
Cite
Notes
Only stored in your browser.
Authored papers
60Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
arXiv 2026
Attention Residuals
arXiv 2026
MAIC-UI: Making Interactive Courseware with Generative UI
arXiv 2026
Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey
arXiv 2026
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
arXiv 2026
Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition
arXiv 2026
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
EliGen: Entity-Level Controlled Image Generation with Regional Attention
arXiv 2025
Adaptation of Agentic AI
arXiv 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
arXiv 2025
RM-R1: Reward Modeling as Reasoning
arXiv 2025
Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing
arXiv 2025
Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning
arXiv 2025
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
focus-on-local-finding-reliable
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
arXiv 2025
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation
arXiv 2025
Knowledge Homophily in Large Language Models
arXiv 2025
Versatile Framework for Song Generation with Prompt-based Control
arXiv 2025
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
arXiv 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
arXiv 2025
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
arXiv 2025
AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity
arXiv 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
arXiv 2025
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective
arXiv 2025
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity
arXiv 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
arXiv 2025
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
arXiv 2024
Scalable MatMul-free Language Modeling
arXiv 2024
SMUTF: Schema Matching Using Generative Tags and Hybrid Features
arXiv 2024
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
arXiv 2024
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
CVPR 2024 1
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
arXiv 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
arXiv 2024
AUITestAgent: Automatic Requirements Oriented GUI Function Testing
arXiv 2024
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
arXiv 2024
DEGAS: Detailed Expressions on Full-Body Gaussian Avatars
arXiv 2024
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
arXiv 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025 1
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
arXiv 2024
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise
arXiv 2024
Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective
arXiv 2024
UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment
arXiv 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
arXiv 2024
Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting
arXiv 2024
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
arXiv 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
arXiv 2023
Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
fine-grained-cross-view-geo-localization
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
CVPR 2024 1
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
bi-lrfusion-bi-directional-lidar-radar-fusion
Multi-view Self-supervised Disentanglement for General Image Denoising
ICCV 2023 1
Non-autoregressive Text Editing with Copy-aware Latent Alignments
arXiv 2023
Explanation Graph Generation via Generative Pre-training over Synthetic Graphs
arXiv 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
arXiv 2023
Effective Structured Prompting by Meta-Learning and Representative Verbalizer
arXiv 2023
MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model
arXiv 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
arXiv 2022
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
arXiv 2022
Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
COLING 2022 10
Fast and Accurate Neural CRF Constituency Parsing
fast-and-accurate-neural-crf-constituency
Affiliations
Frequent co-authors
10from 60 papers