WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models
☆22Feb 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for WavBench
Users that are interested in WavBench are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- Psychoacoustic Loss Function☆17Mar 20, 2025Updated 11 months ago
- noise reduction☆17Jul 3, 2024Updated last year
- Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation☆24Nov 12, 2025Updated 3 months ago
- ☆23Oct 17, 2024Updated last year
- MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music☆26Jan 7, 2026Updated last month
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated last year
- ☆24Sep 10, 2025Updated 5 months ago
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆80Jan 25, 2026Updated last month
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 10 months ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆42Mar 7, 2025Updated 11 months ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- Speech Emotion Recognition using Deep Learning☆12May 24, 2021Updated 4 years ago
- TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks☆22Jan 19, 2026Updated last month
- ☆37Jul 4, 2024Updated last year
- ☆52Dec 7, 2025Updated 2 months ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasonin…☆25Updated this week
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆16Nov 19, 2025Updated 3 months ago
- Improving Symbolic Music Generation with Inference-Time Alignment☆20Aug 2, 2025Updated 6 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- [ACM-MM 2025 Workshop] More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment.☆25Nov 25, 2025Updated 3 months ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- c++的一些基础知识总结☆10Oct 28, 2020Updated 5 years ago
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- Generator for anechoic, non-stationary noise signals☆11Aug 12, 2022Updated 3 years ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- A lightweight muji-moe chatbot created by Reecho.ai.☆12Oct 1, 2024Updated last year
- This is the official repository of Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation.☆12Sep 25, 2024Updated last year
- ☆11Nov 7, 2024Updated last year
- Sound2Synth Plug-Ins☆13Jul 28, 2022Updated 3 years ago
- Basic library for spatial audio SOFA files☆12Sep 29, 2020Updated 5 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Model for selecting perceptually relevant early reflections for parametric spatial sound rendering☆13Oct 26, 2023Updated 2 years ago
- Speech enhancement| Beamforming| NN Mask Estimation| LSTM| DTLN☆15Mar 8, 2023Updated 2 years ago
- ☆32Nov 18, 2025Updated 3 months ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 4 months ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago