tincans-ai / gazelle
Joint speech-language model - respond directly to audio!
☆368Updated 9 months ago
Alternatives and similar repositories for gazelle:
Users that are interested in gazelle are comparing it to the libraries listed below
- ☆204Updated 10 months ago
- On-device intelligence.☆330Updated 3 weeks ago
- ☆281Updated 10 months ago
- Joint speech-language model - respond directly to audio!☆30Updated 11 months ago
- Video+code lecture on building nanoGPT from scratch☆66Updated 10 months ago
- ☆189Updated last week
- ☆354Updated 7 months ago
- A ggml (C++) re-implementation of tortoise-tts☆178Updated 7 months ago
- ☆646Updated last week
- Open source conversation framework and visual editor for structured Pipecat dialogues☆285Updated last week
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆543Updated 4 months ago
- ☆254Updated last year
- G2P☆208Updated this week
- ☆269Updated 10 months ago
- Real-Time Voice Inference Web SDK☆212Updated this week
- ☆156Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translation☆113Updated last year
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆233Updated 3 weeks ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆94Updated 11 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆160Updated 3 weeks ago
- ☆482Updated 10 months ago
- A simple, hackable text-to-speech system in PyTorch and MLX☆148Updated last month
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 11 months ago
- Whisper with Medusa heads☆830Updated last month
- An mlx project to train a base model on your whatsapp chats using (Q)Lora finetuning☆166Updated last year
- Official implementation of "WhisperNER: Unified Open Named Entity and Speech Recognition"☆186Updated last month
- On-device streaming text-to-speech engine powered by deep learning☆76Updated this week
- run paligemma in real time☆131Updated 10 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆231Updated 7 months ago