allenai / OLMoASRLinks
An open-source implementation of Whisper
☆72Updated this week
Alternatives and similar repositories for OLMoASR
Users that are interested in OLMoASR are comparing it to the libraries listed below
Sorting:
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆278Updated 3 months ago
- Kyutai with an "eye"☆215Updated 5 months ago
- ☆83Updated last week
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆193Updated 3 weeks ago
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆68Updated this week
- VLLM Port of the Chatterbox TTS model☆283Updated last week
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆304Updated 4 months ago
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆106Updated last week
- ☆154Updated 4 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆116Updated 3 weeks ago
- ☆132Updated last week
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆30Updated 5 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆271Updated 3 months ago
- ☆102Updated 2 months ago
- ☆289Updated last month
- ☆102Updated 3 months ago
- An OpenSource Deep Research library with reasoning☆155Updated last month
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆57Updated 3 months ago
- Collection of Open Source Speech Data☆159Updated 9 months ago
- Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol☆170Updated this week
- ☆292Updated 3 weeks ago
- ☆220Updated 3 months ago
- ☆31Updated 6 months ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆215Updated 3 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆119Updated 3 weeks ago
- ☆631Updated last month
- ☆133Updated 2 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆267Updated last month
- Streaming and Fine-tuning for Chatterbox TTS☆164Updated 2 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 5 months ago