Yuan Gong
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18BabyVision: Visual Reasoning Beyond Language
arXiv 2026
\$OneMillion-Bench: How Far are Language Agents from Human Experts?
arXiv 2026
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning
arXiv 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
arXiv 2025
DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
arXiv 2024
Joint Audio and Speech Understanding
arXiv 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
arXiv 2023
TaleCrafter: Interactive Story Visualization with Multiple Characters
arXiv 2023
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
arXiv 2023
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
arXiv 2023
Contrastive Audio-Visual Masked Autoencoder
arXiv 2022
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
arXiv 2022
Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network
arXiv 2022
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
CVPR 2023 1
MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
arXiv 2022
AST: Audio Spectrogram Transformer
arXiv 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
arXiv 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
arXiv 2021
Affiliations
Frequent co-authors
10from 18 papers