tincans-ai / gazelle
Joint speech-language model - respond directly to audio!
☆356Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for gazelle
- ☆192Updated 5 months ago
- ☆258Updated 5 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆221Updated 6 months ago
- On-device intelligence.☆196Updated 2 months ago
- Interface for OuteTTS models.☆409Updated 2 weeks ago
- Implementation of F5-TTS in MLX☆332Updated 3 weeks ago
- Collection of Open Source Speech Data☆147Updated 2 weeks ago
- A ggml (C++) re-implementation of tortoise-tts☆160Updated 3 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- Whisper with Medusa heads☆800Updated 3 weeks ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆86Updated last month
- ☆229Updated last month
- Joint speech-language model - respond directly to audio!☆30Updated 6 months ago
- On-device streaming text-to-speech engine powered by deep learning☆57Updated this week
- ☆254Updated 8 months ago
- ☆87Updated 6 months ago
- 🐮📢 The first AI voice assistant that interrupts *you*☆131Updated 2 months ago
- Port of Suno's Bark TTS transformer in Apple's MLX Framework☆72Updated 9 months ago
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆237Updated 2 months ago
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.☆585Updated 6 months ago
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆138Updated 4 months ago
- A fast multimodal LLM for real-time voice☆1,401Updated this week
- ☆308Updated 2 months ago
- ☆462Updated 5 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆103Updated 9 months ago
- ☆254Updated 5 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆113Updated 3 weeks ago
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆439Updated this week
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆176Updated 7 months ago
- Blazing fast whisper turbo for ASR (speech-to-text) tasks☆162Updated last month