aliencaocao / TIL-2023Links
Champion at Brainhack TIL 2023: Team 10000SGDMRT
☆18Updated last year
Alternatives and similar repositories for TIL-2023
Users that are interested in TIL-2023 are comparing it to the libraries listed below
Sorting:
- ☆310Updated last year
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆345Updated 2 years ago
- create dataset from list of youtube links easily☆21Updated 2 years ago
- A curated list of awesome voice activity detection☆66Updated 10 months ago
- Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for the Whisper-medium, designed to enhance performance on mul…☆16Updated 9 months ago
- Finetune VITS and MMS using HuggingFace's tools☆166Updated last year
- ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription☆60Updated last week
- Champion at Brainhack TIL 2022: Team 8000SGD_CAT☆13Updated last year
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.☆177Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆89Updated 2 years ago
- ☆85Updated last year
- ☆49Updated 2 years ago
- A simple, hackable text-to-speech system in PyTorch and MLX☆175Updated 2 months ago
- Whisper finetuned on VinBigdata-VLSP2020-100h + KenLM☆38Updated 2 years ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆68Updated last month
- ☆145Updated last week
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆182Updated 2 months ago
- Official repository of SepReformer for speech separation☆225Updated 9 months ago
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"☆179Updated 3 weeks ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆65Updated 2 years ago
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆130Updated 5 months ago
- ☆378Updated last year
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆127Updated 2 months ago
- Vi_G2P or ViG2P: G2P package for Vietnamese: based on vPhon and phonology knowledge to convert Raw text - Graphoneme to IPA☆98Updated last year
- The Hugging Face Course on Transformers for Audio☆458Updated last week
- A speaker gender classifier. MFC feature engineering and a pre-trained ResNet-50. GradCAM interpretation.☆27Updated 3 years ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆149Updated last year
- This is the audio sample repository for speech separation model "MossFormer2".☆151Updated 10 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆53Updated 4 months ago
- ☆39Updated 3 years ago