Implementation of Google's USM speech model in Pytorch
☆35Apr 13, 2026Updated this week
Alternatives and similar repositories for USM
Users that are interested in USM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆36May 7, 2025Updated 11 months ago
- ☆17Jul 22, 2024Updated last year
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Nov 19, 2024Updated last year
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Jan 27, 2025Updated last year
- Various agents from all of the top agent frameworks to integrate into swarms! Langchain, Griptape, CrewAI, and more!☆18Dec 22, 2025Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆34Jun 1, 2023Updated 2 years ago
- ConMamba for Automatic Speech Recognition☆103Aug 12, 2024Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Apr 6, 2026Updated last week
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆12Mar 14, 2025Updated last year
- Forced alignment decoder for Whisper.☆15Mar 13, 2024Updated 2 years ago
- A lightweight audio codec based on a single quantizer☆34Sep 4, 2025Updated 7 months ago
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorch☆19Mar 30, 2026Updated 2 weeks ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- ☆28Oct 7, 2025Updated 6 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"☆14Updated this week
- A forest of autonomous agents.☆20Jan 27, 2025Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- ☆14Aug 19, 2024Updated last year
- ☆17May 5, 2024Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated 2 years ago
- ☆14Jul 24, 2025Updated 8 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George…☆20Oct 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆29Oct 15, 2024Updated last year
- A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)☆38Mar 31, 2026Updated 2 weeks ago
- ☆16Mar 12, 2024Updated 2 years ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- Target speaker automatic speech recognition (TS-ASR)☆13Oct 14, 2023Updated 2 years ago
- kaldi cnn-tdnnf baseline☆13Aug 31, 2021Updated 4 years ago
- Wenet speech to text for react native☆10Nov 1, 2022Updated 3 years ago
- Code for the ICML 2021 paper "Sharing Less is More: Lifelong Learning in Deep Networks with Selective Layer Transfer"☆12Aug 17, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆125Updated this week
- Getting confidences from any end-to-end systems☆11May 24, 2023Updated 2 years ago
- ☆61Nov 4, 2023Updated 2 years ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 7 months ago
- text to speech☆10Mar 19, 2024Updated 2 years ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆14Feb 5, 2025Updated last year