haoxiangsnr/llm-tse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/haoxiangsnr/llm-tse)

haoxiangsnr / llm-tse

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)

☆43

Alternatives and similar repositories for llm-tse

Users that are interested in llm-tse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WangHelin1997 / LibriLightMix-WHAMR
View on GitHub
Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM
☆17Nov 7, 2024Updated last year
LiChenda / Multi-clue-TSE-data
View on GitHub
Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"
☆17May 19, 2023Updated 3 years ago
yangdongchao / Tim-TSENet
View on GitHub
The source code of Tim-TSENet
☆15Apr 22, 2022Updated 4 years ago
JusperLee / Swift-Net
View on GitHub
Power-Guided Grouped SRU for Real-Time Causal Audio-Visual Speech Separation
☆26Jul 20, 2026Updated last week
Audio-AGI / dcase2024_task9_baseline
View on GitHub
Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"
☆26Mar 27, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
fakufaku / diffusion-separation
View on GitHub
Single channel speech source separation by diffusion process (ICASSP 2023)
☆126Mar 15, 2024Updated 2 years ago
JusperLee / SPMamba
View on GitHub
☆227Dec 5, 2024Updated last year
xiaoxiaomiao323 / MSA
View on GitHub
☆16Feb 19, 2026Updated 5 months ago
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
NaoyukiKanda / LibriSpeechMix
View on GitHub
☆38Mar 30, 2021Updated 5 years ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
Beilong-Tang / TSELM
View on GitHub
Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models
☆60Apr 14, 2025Updated last year
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 7 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
JusperLee / TFACM
View on GitHub
☆24Jul 16, 2025Updated last year
WangHelin1997 / AT-GCN
View on GitHub
Pytorch implementation of the paper : Modeling Label Dependencies for Audio Tagging with Graph Convolutional Network
☆15Sep 18, 2020Updated 5 years ago
WangHelin1997 / SoloAudio
View on GitHub
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆121Jan 28, 2026Updated 6 months ago
sp-uhh / sgmse-bbed
View on GitHub
Brownian Bridge with Exponential Diffusion Coefficient
☆43Nov 1, 2023Updated 2 years ago
etzinis / heterogeneous_separation
View on GitHub
Code and data recipes for the paper: Heterogeneous Target Speech Separation
☆44Dec 6, 2022Updated 3 years ago
TomJwYu / WenetSpeechSpeakerCluster
View on GitHub
☆55Jul 17, 2023Updated 3 years ago
Aisaka0v0 / CLAPSep
View on GitHub
Query-conditioned target sound extraction model
☆30Mar 25, 2025Updated last year
ga642381 / Speech-Prompts-Adapters
View on GitHub
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
☆113Aug 4, 2023Updated 2 years ago
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Sanyuan-Chen / CSS_with_Conformer
View on GitHub
Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.
☆120Mar 18, 2023Updated 3 years ago
yukara-ikemiya / floss-torch
View on GitHub
PyTorch implementation of "Source Separation by Flow Matching (FLOSS)" by Google DeepMind
☆97Nov 24, 2025Updated 8 months ago
aispeech-lab / LiMuSE
View on GitHub
PyTorch implementation of LiMuSE
☆33Oct 11, 2022Updated 3 years ago
WingSingFung / TISDiSS
View on GitHub
Official implementation of TISDiSS, a scalable framework for discriminative source separation.
☆16Oct 19, 2025Updated 9 months ago
BUTSpeechFIT / speakerbeam
View on GitHub
☆145Oct 25, 2021Updated 4 years ago
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
Audio-WestlakeU / RVAE-EM
View on GitHub
Official PyTorch implementation of "RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutiv…
☆51Mar 6, 2025Updated last year
fakufaku / fast_bss_eval
View on GitHub
A fast implementation of bss_eval metrics for blind source separation
☆149Mar 11, 2026Updated 4 months ago
Audio-WestlakeU / McNet
View on GitHub
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
☆130Mar 24, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
facebookresearch / spidr-adapt
View on GitHub
This repository contains the checkpoints and training code for the few-shot adaptation speech models in the SpidR-Adapt paper.
☆23Dec 29, 2025Updated 6 months ago
tencent-ailab / FRA-RIR
View on GitHub
☆214Dec 4, 2023Updated 2 years ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
mutiann / neural-lexicon-reader
View on GitHub
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
☆21Jul 25, 2022Updated 4 years ago
aispeech-lab / TinyWASE
View on GitHub
PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…
☆11Jun 28, 2021Updated 5 years ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
donghoney0416 / DeepASA
View on GitHub
Official page of "DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis"
☆26Apr 15, 2026Updated 3 months ago