0

Lei Xie

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

arXiv 2026

2026

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

arXiv 2025

2025

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

arXiv 2025

2025

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

arXiv 2025

2025

Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models

arXiv 2025

2025

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

arXiv 2025

2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

arXiv 2025

2025

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought

arXiv 2025

2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

arXiv 2025

2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

arXiv 2025

2025

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

CVPR 2025 1

2024

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

arXiv 2024

2024

Autoregressive Speech Synthesis with Next-Distribution Prediction

arXiv 2024

2024

FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

arXiv 2024

2024

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

arXiv 2024

2024

Learning Multi-view Anomaly Detection

arXiv 2024

2024

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

arXiv 2024

2024

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

arXiv 2024

2024

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

arXiv 2023

2023

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

arXiv 2021

2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

arXiv 2021

2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

arXiv 2021

2021

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

arXiv 2021

2021

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers