01Zhangbw / Speech-and-audio-papers-Top-ConferenceView external linksLinks
☆131Jan 24, 2026Updated 3 weeks ago
Alternatives and similar repositories for Speech-and-audio-papers-Top-Conference
Users that are interested in Speech-and-audio-papers-Top-Conference are comparing it to the libraries listed below
Sorting:
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆50Apr 7, 2025Updated 10 months ago
- Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…☆22Aug 14, 2025Updated 6 months ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆20Jul 27, 2024Updated last year
- A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Mult…☆39Oct 11, 2024Updated last year
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆113Jun 4, 2025Updated 8 months ago
- Audio Codec Speech processing Universal PERformance Benchmark☆296Jan 8, 2026Updated last month
- Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation☆23Nov 12, 2025Updated 3 months ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Dec 3, 2024Updated last year
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆33Sep 9, 2025Updated 5 months ago
- Awesome speech/audio LLMs, representation learning, and codec models☆1,209Aug 13, 2025Updated 6 months ago
- offical code for Dense-TSNet☆12Sep 17, 2024Updated last year
- ☆18May 4, 2025Updated 9 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 8 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 8 months ago
- Self-supervised Generative LM-based Voice Conversion☆54Apr 24, 2025Updated 9 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 4 months ago
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆97Apr 1, 2025Updated 10 months ago
- This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)☆249Dec 12, 2025Updated 2 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 2 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆239Dec 18, 2025Updated 2 months ago
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆120Jul 15, 2025Updated 7 months ago
- Official baseline, dataset and evaluation scripts for the ICASSP 2026 URGENT challenge.☆32Nov 12, 2025Updated 3 months ago
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆83Oct 11, 2024Updated last year
- Official Implementation of LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models.☆32Nov 9, 2025Updated 3 months ago
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆55Apr 14, 2025Updated 10 months ago
- Audio Large Language Models☆868Jul 5, 2025Updated 7 months ago
- Huggingface Implementation of AV-HuBERT on the MuAViC Dataset☆17Mar 6, 2025Updated 11 months ago
- ☆59Oct 22, 2025Updated 3 months ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆190Nov 10, 2024Updated last year
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆195Dec 13, 2025Updated 2 months ago
- Update ASR paper everyday☆452Updated this week
- Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation☆24Nov 8, 2021Updated 4 years ago
- Power-Guided Grouped SRU for Real-Time Causal Audio-Visual Speech Separation☆23Nov 4, 2025Updated 3 months ago
- unofficial implementation of "CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT"☆15Nov 14, 2023Updated 2 years ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- [CVPR 2025] Pytorch implementation of the paper "Hearing Anywhere in Any Environment"☆25Sep 18, 2025Updated 5 months ago
- A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurI…☆151Apr 29, 2025Updated 9 months ago