Lei Xie
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
arXiv 2026
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
arXiv 2025
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
arXiv 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
arXiv 2025
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
arXiv 2025
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
arXiv 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
arXiv 2025
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
arXiv 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
arXiv 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
arXiv 2025
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
CVPR 2025 1
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
arXiv 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
arXiv 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
arXiv 2024
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
arXiv 2024
Learning Multi-view Anomaly Detection
arXiv 2024
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023
arXiv 2024
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection
arXiv 2024
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
arXiv 2023
WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
arXiv 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
arXiv 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
arXiv 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
arXiv 2021
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
arXiv 2020
Affiliations
Frequent co-authors
10from 24 papers