AmphionTeam/SD-Eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AmphionTeam/SD-Eval)

AmphionTeam / SD-Eval

[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

☆57

Alternatives and similar repositories for SD-Eval

Users that are interested in SD-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mct10 / CoBERT
View on GitHub
Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
☆48Nov 8, 2023Updated 2 years ago
WangHelin1997 / LibriLightMix-WHAMR
View on GitHub
Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM
☆17Nov 7, 2024Updated last year
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆162Mar 26, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
liyunlongaaa / AD-TUNING
View on GitHub
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…
☆11Feb 23, 2024Updated 2 years ago
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆197Feb 28, 2026Updated 4 months ago
csun22 / LibriVoc-Dataset
View on GitHub
LibriVoc is a new open-source, large-scale dataset for vocoder artifact detection. LibriVoc is derived from the LibriTTS speech corpus, w…
☆16Nov 6, 2025Updated 8 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Ruiqi-Yan / URO-Bench
View on GitHub
Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
☆55Sep 2, 2025Updated 10 months ago
0nutation / SLMTokBench
View on GitHub
SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
☆37Aug 29, 2023Updated 2 years ago
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
bfs18 / e2_tts
View on GitHub
☆70Sep 3, 2024Updated last year
DanielLin94144 / StyleTalk
View on GitHub
Official release of StyleTalk dataset.
☆75Jul 1, 2024Updated 2 years ago
ashi-ta / speechGLUE
View on GitHub
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Jun 2, 2023Updated 3 years ago
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
slp-rl / salmon
View on GitHub
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
☆50Aug 15, 2025Updated 11 months ago
fengpeng-yue / speech-to-speech-translation
View on GitHub
☆25Feb 12, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
emo-box / EmoBox
View on GitHub
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
☆321Mar 18, 2026Updated 4 months ago
line / LibriTTS-P
View on GitHub
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
☆161Jun 13, 2024Updated 2 years ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
Respaired / RiFornet_Vocoder
View on GitHub
a Neural Vocoder supporting Ring Attention, Conformer and NSF.
☆25Aug 1, 2025Updated 11 months ago
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆517Dec 22, 2025Updated 6 months ago
cantabile-kwok / vec2wav2.0
View on GitHub
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
☆79Dec 3, 2024Updated last year
cwx-worst-one / EAT
View on GitHub
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆237Nov 30, 2025Updated 7 months ago
TaoRuijie / AVCleanse
View on GitHub
ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
☆44Oct 31, 2022Updated 3 years ago
unilight / LDNet
View on GitHub
Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"
☆68Dec 13, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mechanicalsea / lighthubert
View on GitHub
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
☆73Sep 26, 2022Updated 3 years ago
facebookresearch / emphassess
View on GitHub
This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …
☆25Jan 9, 2024Updated 2 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
hainan-xv / PASM
View on GitHub
Pronunciation-assisted Subword Modeling
☆31May 30, 2019Updated 7 years ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago