☆134Jan 24, 2026Updated last month
Alternatives and similar repositories for Speech-and-audio-papers-Top-Conference
Users that are interested in Speech-and-audio-papers-Top-Conference are comparing it to the libraries listed below
Sorting:
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆50Apr 7, 2025Updated 11 months ago
- Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…☆22Aug 14, 2025Updated 6 months ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆20Jul 27, 2024Updated last year
- A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Mult…☆39Oct 11, 2024Updated last year
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆113Jun 4, 2025Updated 9 months ago
- Audio Codec Speech processing Universal PERformance Benchmark☆297Jan 8, 2026Updated 2 months ago
- Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation☆25Nov 12, 2025Updated 3 months ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Dec 3, 2024Updated last year
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆35Sep 9, 2025Updated 6 months ago
- Awesome speech/audio LLMs, representation learning, and codec models☆1,210Aug 13, 2025Updated 6 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated last month
- offical code for Dense-TSNet☆12Sep 17, 2024Updated last year
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 9 months ago
- ☆18May 4, 2025Updated 10 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- Self-supervised Generative LM-based Voice Conversion☆54Apr 24, 2025Updated 10 months ago
- Audio-FLAN☆159Sep 23, 2025Updated 5 months ago
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆100Apr 1, 2025Updated 11 months ago
- This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)☆251Dec 12, 2025Updated 2 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 3 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆241Dec 18, 2025Updated 2 months ago
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆121Jul 15, 2025Updated 7 months ago
- Official baseline, dataset and evaluation scripts for the ICASSP 2026 URGENT challenge.☆33Nov 12, 2025Updated 3 months ago
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆84Oct 11, 2024Updated last year
- Official Implementation of LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models.☆32Nov 9, 2025Updated 4 months ago
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆56Apr 14, 2025Updated 10 months ago
- Audio Large Language Models☆882Jul 5, 2025Updated 8 months ago
- ☆60Oct 22, 2025Updated 4 months ago
- Huggingface Implementation of AV-HuBERT on the MuAViC Dataset☆18Mar 6, 2025Updated last year
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆190Nov 10, 2024Updated last year
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆197Feb 25, 2026Updated last week
- Update ASR paper everyday☆462Updated this week
- Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation☆24Nov 8, 2021Updated 4 years ago
- unofficial implementation of "CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT"☆15Nov 14, 2023Updated 2 years ago
- A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurI…☆153Apr 29, 2025Updated 10 months ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- [CVPR 2025] Pytorch implementation of the paper "Hearing Anywhere in Any Environment"☆25Sep 18, 2025Updated 5 months ago