hitz-zentroa / whisper-lmView external linksLinks
Add n-gram and large language model (LLM) support to Whisper models.
☆41May 6, 2025Updated 9 months ago
Alternatives and similar repositories for whisper-lm
Users that are interested in whisper-lm are comparing it to the libraries listed below
Sorting:
- ☆43Sep 3, 2025Updated 5 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 7 months ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- Evaluation of STT models for german language☆15Jan 22, 2022Updated 4 years ago
- PitchVC: Pitch Conditioned Any-to-Many Voice Conversion☆36Jun 6, 2024Updated last year
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 8 months ago
- wav2vec2 asr with transformers☆16Oct 26, 2021Updated 4 years ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- A curated list of awesome voice activity detection☆73Nov 22, 2024Updated last year
- Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its…☆18Jan 15, 2026Updated last month
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- Fine-tuning Wav2Vec2.0 on Common Voice(zh-HK)☆16May 8, 2022Updated 3 years ago
- ☆14Jul 24, 2025Updated 6 months ago
- ☆25Jun 19, 2025Updated 7 months ago
- multilingual speech aligner☆76Nov 19, 2023Updated 2 years ago
- Conformer RNN-Transducer☆14May 25, 2022Updated 3 years ago
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆40Jun 17, 2025Updated 8 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆52Apr 1, 2021Updated 4 years ago
- A simple, but performant framework for mapping speech directly to categories and intents.☆25Aug 8, 2024Updated last year
- ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models☆34Nov 18, 2025Updated 3 months ago
- A repo listing known open source voice tools, ordered by where they sit in the voice stack☆27Sep 23, 2022Updated 3 years ago
- [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)☆39Sep 8, 2025Updated 5 months ago
- In-car multi-channel speech transcription system of AISHELL-5.☆41Jun 9, 2025Updated 8 months ago
- An example directory for running Multi-Task Learning training on Kaldi neural networks. In Kaldi-speak, this is an egs dir for nnet3 trai…☆55Jan 2, 2020Updated 6 years ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- ☆40Jul 15, 2025Updated 7 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 7 months ago
- ☆98Jan 19, 2026Updated 3 weeks ago
- Distributed Optimization Infra for learning CLIP models☆27Oct 3, 2024Updated last year
- The official implementation of the DIFFA series for dLLM-based large audio language model☆59Feb 2, 2026Updated 2 weeks ago
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆75Jan 25, 2026Updated 3 weeks ago
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…☆64Jan 27, 2026Updated 3 weeks ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated 2 weeks ago
- ☆18Jun 10, 2025Updated 8 months ago
- Official Repository For VoxBlink2☆85Aug 13, 2024Updated last year
- Implementation of Google's USM speech model in Pytorch☆34Feb 7, 2026Updated last week
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆99Jan 26, 2026Updated 3 weeks ago
- Target Speaker Extraction Toolkit☆245Oct 4, 2025Updated 4 months ago