Implementation of Google's USM speech model in Pytorch
☆35Feb 7, 2026Updated 3 weeks ago
Alternatives and similar repositories for USM
Users that are interested in USM are comparing it to the libraries listed below
Sorting:
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- ☆14Aug 19, 2024Updated last year
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorch☆19Feb 9, 2026Updated 3 weeks ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆35May 7, 2025Updated 9 months ago
- ☆14Jul 24, 2025Updated 7 months ago
- A lightweight audio codec based on a single quantizer☆32Sep 4, 2025Updated 6 months ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- ☆17Jul 22, 2024Updated last year
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Jun 1, 2023Updated 2 years ago
- ConMamba for Automatic Speech Recognition☆102Aug 12, 2024Updated last year
- KittenTTS is an ultra-lightweight, CPU-friendly text-to-speech model with 15M params for real-time, high-quality voices. Open source, fas…☆23Updated this week
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆29Oct 15, 2024Updated last year
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Jan 27, 2025Updated last year
- text to speech☆10Mar 19, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Getting confidences from any end-to-end systems☆11May 24, 2023Updated 2 years ago
- ☆10Apr 17, 2024Updated last year
- Target speaker automatic speech recognition (TS-ASR)☆12Oct 14, 2023Updated 2 years ago
- An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"☆14Feb 23, 2026Updated last week
- Wenet speech to text for react native☆10Nov 1, 2022Updated 3 years ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Jan 27, 2025Updated last year
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Feb 23, 2026Updated last week
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆13Feb 5, 2025Updated last year
- kaldi cnn-tdnnf baseline☆13Aug 31, 2021Updated 4 years ago
- Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)☆13Nov 14, 2024Updated last year
- Openfst mirror with some fixes☆14Aug 23, 2024Updated last year
- Cantonese Grapheme-to-Phoneme Converter based on GitYCC/g2pW☆15Dec 10, 2024Updated last year
- Various agents from all of the top agent frameworks to integrate into swarms! Langchain, Griptape, CrewAI, and more!☆18Dec 22, 2025Updated 2 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated last year
- Using OpenVINO to speed up MeloTTS inference☆15Nov 1, 2024Updated last year
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…☆16Feb 1, 2026Updated last month
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆16Dec 3, 2024Updated last year
- ☆28Oct 7, 2025Updated 4 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆81Jun 7, 2024Updated last year
- Whisper Speech Quality Assessment (WhiSQA)☆16Oct 14, 2025Updated 4 months ago