jimbozhang / xares-templateLinks

Template for creating audio encoders compatible with X-ARES

☆11

Alternatives and similar repositories for xares-template

Users that are interested in xares-template are comparing it to the libraries listed below

Sorting:

jimbozhang / xares
A benchmark for evaluating audio encoders on various audio tasks.
☆28Updated this week
ftshijt / speech_evaluation
A toolkit dedicate for speech evaluation.
☆24Updated last year
xiaomi-research / dasheng-glap
Official Implementation of GLAP - General Language Audio Pretraining
☆50Updated 4 months ago
nttcslab-sp / mamba-diarization
Official repository for Mamba-based Segmentation Model for Speaker Diarization
☆43Updated 5 months ago
Hunterhuan / sphereface2_speaker_verification
Exploring Binary Classification Loss for Speaker Verification
☆18Updated 2 years ago
MorenoLaQuatra / ARCH
ARCH: Audio Representations benCHmark
☆51Updated last year
fcumlin / DNSMOSPro
Official implementation of DNSMOS Pro (accepted at INTERSPEECH 2024).
☆65Updated 4 months ago
seongq / flowmse
(ICASSP 2025, official code)FlowSE: Flow Matching-based Speech Enhancement
☆68Updated 3 months ago
X-LANCE / KWStreamingSearch
☆69Updated 4 months ago
haoxiangsnr / llm-tse
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)
☆43Updated 2 years ago
jishengpeng / WavReward
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆54Updated 5 months ago
nonverbalspeech38k / nonverspeech38k
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆50Updated last month
ASLP-lab / WenetSpeech-Chuan
Official repository for the WenetSpeech-Chuan dataset.
☆66Updated last week
HuangZiliAndy / SSL_for_multitalker
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
☆31Updated 2 years ago
ftshijt / Interspeech2024_DiscreteSpeechChallenge
This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.
☆32Updated last year
RicherMans / Dasheng
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆73Updated 6 months ago
pengzhendong / torchfa
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆55Updated last month
exercise-book-yq / Supercodec
☆49Updated 7 months ago
sp-uhh / ears_benchmark
Generation scripts for EARS-WHAM and EARS-Reverb
☆38Updated 3 months ago
kaistmm / seed-pytorch
[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"
☆51Updated last month
merlresearch / tssep
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
☆35Updated last year
the-bird-F / GLM-Voice-RAG
A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E2E Retrieval.
☆23Updated 3 months ago
Beilong-Tang / TSELM
Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models
☆50Updated 6 months ago
DaiYvhang / AISHELL-5
In-car multi-channel speech transcription system of AISHELL-5.
☆36Updated 4 months ago
Hannieliao / Emilia-NV
Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"
☆75Updated last month
Andong-Li-speech / Neural-Vocoders-as-Speech-Enhancers
☆51Updated last year
hongfeixue / StutteringSpeechChallenge
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Updated last year
3loi / NaturalVoices
☆55Updated last week
Shy-98 / MELLE
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆38Updated 4 months ago
cheoljun95 / sdhubert
☆25Updated 10 months ago