Pingchuan Ma
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
arXiv 2025
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
arXiv 2025
DepthFM: Fast Monocular Depth Estimation with Flow Matching
arXiv 2024
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
arXiv 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
arXiv 2024
ROICtrl: Boosting Instance Control for Visual Generation
CVPR 2025 1
Large Language Models are Strong Audio-Visual Speech Recognition Learners
arXiv 2024
Diffusion Models and Representation Learning: A Survey
arXiv 2024
Does VLM Classification Benefit from LLM Description Semantics?
arXiv 2024
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
arXiv 2023
Boosting Latent Diffusion with Flow Matching
arXiv 2023
Visual Speech Recognition for Multiple Languages in the Wild
arXiv 2022
Affiliations
Frequent co-authors
10from 12 papers