将视频中不同说话人的声音提取后区分保存,得到音频训练数据
☆29May 23, 2024Updated last year
Alternatives and similar repositories for speaker-diarization
Users that are interested in speaker-diarization are comparing it to the libraries listed below
Sorting:
- C++ version of pyannote audio overlapped speech detection pipeline☆13Feb 14, 2024Updated 2 years ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 2 months ago
- 基于LangChain + Xinference + Chroma构建的本地知识库☆12Jun 13, 2025Updated 9 months ago
- EaseVoice Trainer is a simple and user-friendly voice cloning and speech model trainer.☆14Apr 27, 2025Updated 10 months ago
- Code for "Error-driven Fixed-Budget ASR Personalization for Accented Speakers" in ICASSP 2021☆11Jun 13, 2021Updated 4 years ago
- UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language☆13Jan 6, 2026Updated 2 months ago
- ☆12Aug 15, 2022Updated 3 years ago
- 这是一个批量推理工具,对同一段文字进行多次推理,并且支持随机参数,直到筛选出最满意的结果。☆11Aug 19, 2024Updated last year
- PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…☆11Jun 28, 2021Updated 4 years ago
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆25Jan 25, 2025Updated last year
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- A lightweight tool that efficiently isolates target speaker data from your datasets.☆19Nov 23, 2024Updated last year
- The power-law compressed phase-aware asymmetric (PLCPA-ASYM) loss☆14Sep 4, 2023Updated 2 years ago
- This is a project of Interspeech2021 paper "SpecMix : A Mixed Sample Data Augmentation method for Training with Time-Frequency Domain Fea…☆11Sep 27, 2022Updated 3 years ago
- Optimizing Source and Sensor Placement for Sound Field Control☆16Mar 27, 2023Updated 2 years ago
- An unofficial non-causal Tensorflow implementation of "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Spee…☆14Dec 27, 2022Updated 3 years ago
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆90Jan 25, 2026Updated last month
- ☆15Sep 16, 2024Updated last year
- A code repository for the accepted paper entitled "Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain…☆18Feb 17, 2025Updated last year
- Cross-Layer Similarity Knowledge Distillation for Speech Enhancement☆11Jun 22, 2023Updated 2 years ago
- ☆12May 22, 2023Updated 2 years ago
- SOFiA - Sound Field Analysis Toolbox for Matlab☆24Feb 21, 2024Updated 2 years ago
- Streaming Audiotransformers for online Audio tagging☆53Jun 14, 2024Updated last year
- It is a simple tool to convert roman script to indic(Devanagari) script. As most Keyboards are English and to write in Indic script is di…☆13Aug 31, 2016Updated 9 years ago
- dan povey's local copy of kadi-asr/kaldi☆19Nov 10, 2023Updated 2 years ago
- Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"☆36Jan 28, 2026Updated last month
- 基于GptSoVits项目的参考音频筛选工具☆23Aug 17, 2025Updated 7 months ago
- LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀☆15Jul 12, 2021Updated 4 years ago
- An implementation of rnn transducer for sequence labeling problem☆22Feb 24, 2018Updated 8 years ago
- ☆16Sep 12, 2023Updated 2 years ago
- AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025☆46Nov 6, 2025Updated 4 months ago
- 来自于文章Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition☆27Nov 20, 2024Updated last year
- Non-Uniform FFT on the CPU and GPU (1D, 2D and 3D)☆14Jan 13, 2021Updated 5 years ago
- Convolution and Transposed Convolution in a Matrix Multiplication View☆15Apr 18, 2021Updated 4 years ago
- You can chat with your mysql database using llama3 llm model and langchain☆28Jul 17, 2024Updated last year
- ☆21Jul 16, 2025Updated 8 months ago
- The implementation of G2Net, the extension of GaGNet and is in submission to T-ASLP☆19Apr 27, 2022Updated 3 years ago
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆57Apr 14, 2025Updated 11 months ago
- ☆23Oct 20, 2021Updated 4 years ago