CouncilDataProject / speakerbox
Speakerbox: Fine-tune Audio Transformers for speaker identification.
β56Updated 4 months ago
Alternatives and similar repositories for speakerbox:
Users that are interested in speakerbox are comparing it to the libraries listed below
- Speaker change detection using SincNet and an LSTM/Transformerβ50Updated 9 months ago
- πΉ pyannote + π notebook = pyannotebookβ26Updated last year
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.β13Updated last year
- β24Updated last year
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.β50Updated 2 years ago
- NOTSOFAR-1 Challenge: Distant Diarization and ASRβ52Updated 2 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β81Updated last year
- Clustering-based methods for overlapping diarizationβ81Updated last year
- Machine learning speaker characteristicsβ33Updated 2 weeks ago
- A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.β21Updated last month
- Phoneme alignment representation compatible with multiple forced alignersβ21Updated last year
- Reproducible experimental protocols for multimedia (audio, video, text) databaseβ100Updated 2 months ago
- Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.β17Updated 3 weeks ago
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)β15Updated 2 months ago
- 56 language, 1 model Multilingual ASRβ25Updated 3 years ago
- Deep Speech Distances PyTorchβ28Updated 3 years ago
- A toolkit to calculate speech audio quality. Not affiliated with the original authorsβ50Updated 8 months ago
- Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentationβ15Updated last year
- An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.β82Updated last year
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIβ¦β21Updated 7 months ago
- Code for the method proposed in the paper:- ccc-wav2vec 2.0: Clustering aided Cross-Contrastive learning of Self-Supervised speech represβ¦β21Updated last year
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Modelβ32Updated last year
- Adapting a ConvNeXt model to audio classification on AudioSetβ22Updated 2 months ago
- Simple Python package for fast DER computationβ33Updated last year
- Convert English text from written expressions into spoken formsβ25Updated 2 years ago
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.β28Updated last week
- Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decodingβ75Updated 3 years ago
- Rescoring methods for end-to-end Automatic Speech Recognitionβ27Updated 4 years ago
- β38Updated 3 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.β27Updated last year