HuangZiliAndy/SSL_for_multitalker

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HuangZiliAndy/SSL_for_multitalker)

HuangZiliAndy / SSL_for_multitalker

ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS

☆33

Alternatives and similar repositories for SSL_for_multitalker

Users that are interested in SSL_for_multitalker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Aisaka0v0 / TS-Whisper
View on GitHub
☆33Jun 12, 2025Updated last year
LingweiMeng / Whisper-Sidecar
View on GitHub
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
☆34Aug 2, 2025Updated 11 months ago
X-LANCE / public_talks
View on GitHub
Materials of public talks given By SJTU X-LANCE members
☆14Dec 3, 2022Updated 3 years ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
asteroid-team / pytorch-pit
View on GitHub
Permutation invariant training in PyTorch
☆13Oct 2, 2020Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
desh2608 / gss
View on GitHub
A simple package for Guided source separation (GSS)
☆134May 20, 2024Updated 2 years ago
cuhealthybrains / MT-LLM
View on GitHub
The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
☆51Apr 7, 2025Updated last year
wavlab-speech / shinjiwlab.github.io
View on GitHub
☆18Updated this week
mutiann / speech_rankings
View on GitHub
A CSRankings-like index for speech researchers
☆35Oct 16, 2024Updated last year
huangruizhe / ConEC
View on GitHub
☆14Jun 17, 2024Updated 2 years ago
isca-sig-rosp / ISCA-SIG-RoSP
View on GitHub
Web page for ISCA Special Interest Group: Robust Speech Processing (RoSP)
☆11Dec 4, 2023Updated 2 years ago
MagicHub-io / CSASR_Challenge
View on GitHub
☆11Sep 26, 2022Updated 3 years ago
liyunlongaaa / AD-TUNING
View on GitHub
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…
☆11Feb 23, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
NaoyukiKanda / LibriSpeechMix
View on GitHub
☆38Mar 30, 2021Updated 5 years ago
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
ICASSP2021-tutorial9 / Distant_conversational_ASR_and_analysis
View on GitHub
☆12Jun 10, 2021Updated 5 years ago
etzinis / optimal_condition_training
View on GitHub
Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…
☆14Feb 15, 2023Updated 3 years ago
chenzhuo1011 / libri_css
View on GitHub
Libri-CSS: dataset and evaluation pipeline
☆157Jan 18, 2023Updated 3 years ago
felixfuyihui / AISHELL-4
View on GitHub
☆140Jul 21, 2021Updated 5 years ago
light1726 / Speech-Tokenization-Papers
View on GitHub
This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…
☆15Dec 1, 2023Updated 2 years ago
xinjli / phonepiece
View on GitHub
phone inventory library
☆17May 15, 2023Updated 3 years ago
k2-fsa / libriheavy
View on GitHub
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
☆220Sep 10, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
awthomp / cusignal-icassp-tutorial
View on GitHub
4 Hour cuSignal Tutorial - ICASSP 2021 Notebooks
☆49Jun 7, 2021Updated 5 years ago
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
jsalt2020-asrdiar / jsalt2020_simulate
View on GitHub
Training data simulation
☆60May 6, 2024Updated 2 years ago
skit-ai / slu-prosody
View on GitHub
Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…
☆27May 17, 2023Updated 3 years ago
X-LANCE / UniCATS-CTX-vec2wav
View on GitHub
[AAAI 2024] Code for CTX-vec2wav in UniCATS
☆130Jun 11, 2024Updated 2 years ago
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
mct10 / CoBERT
View on GitHub
Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
☆48Nov 8, 2023Updated 2 years ago
BUTSpeechFIT / EEND
View on GitHub
☆95Apr 24, 2025Updated last year
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
danpovey / quantization
View on GitHub
Torch-based tool for quantizing high-dimensional vectors using additive codebooks
☆54May 25, 2022Updated 4 years ago
Speech-Lab-IITM / data2vec-aqc
View on GitHub
Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…
☆13Mar 18, 2024Updated 2 years ago
fgnt / ci_sdr
View on GitHub
☆53May 15, 2025Updated last year
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
BUTSpeechFIT / speakerbeam
View on GitHub
☆145Oct 25, 2021Updated 4 years ago
idiap / icassp-oov-recognition
View on GitHub
Data and code related to the ICASSP submission "A comparison of methods for OOV-word recognition"
☆17Nov 28, 2021Updated 4 years ago