collabora/WhisperFusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/collabora/WhisperFusion)

collabora / WhisperFusion

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

☆1,646

Alternatives and similar repositories for WhisperFusion

Users that are interested in WhisperFusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WhisperSpeech / WhisperSpeech
View on GitHub
An Open Source text-to-speech system built by inverting Whisper.
☆4,625Dec 14, 2025Updated 7 months ago
collabora / WhisperLive
View on GitHub
A nearly-live implementation of OpenAI's Whisper.
☆4,145Updated this week
yl4579 / StyleTTS2
View on GitHub
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
☆6,314Aug 10, 2024Updated last year
metavoiceio / metavoice-src
View on GitHub
Foundational model for human-like, expressive TTS
☆4,203Jul 30, 2024Updated last year
huggingface / distil-whisper
View on GitHub
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
☆4,091Jan 8, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
fixie-ai / ultravox
View on GitHub
A fast multimodal LLM for real-time voice
☆4,476Dec 12, 2025Updated 7 months ago
myshell-ai / OpenVoice
View on GitHub
Instant voice cloning by MIT and MyShell. Audio foundation model.
☆36,984Apr 19, 2025Updated last year
huggingface / parler-tts
View on GitHub
Inference and training library for high-quality TTS models.
☆5,582Dec 10, 2024Updated last year
ictnlp / LLaMA-Omni
View on GitHub
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…
☆3,141May 19, 2025Updated last year
jasonppy / VoiceCraft
View on GitHub
Zero-Shot Speech Editing and Text-to-Speech in the Wild
☆8,496May 30, 2026Updated last month
lavague-ai / LaVague
View on GitHub
Large Action Model framework to develop AI Web Agents
☆6,381Jan 21, 2025Updated last year
kyutai-labs / moshi
View on GitHub
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…
☆10,646May 16, 2026Updated 2 months ago
myshell-ai / MeloTTS
View on GitHub
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
☆7,546Dec 24, 2024Updated last year
lifeiteng / OmniSenseVoice
View on GitHub
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
☆897Dec 10, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
m-bain / whisperX
View on GitHub
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆23,170Jul 13, 2026Updated last week
m87-labs / moondream
View on GitHub
tiny vision language model
☆9,873Apr 20, 2026Updated 3 months ago
facebookresearch / seamless_communication
View on GitHub
Foundational Models for State-of-the-Art Speech and Text Translation
☆11,816Apr 8, 2026Updated 3 months ago
Vaibhavs10 / insanely-fast-whisper
View on GitHub
☆12,991Oct 25, 2025Updated 8 months ago
Standard-Intelligence / hertz-dev
View on GitHub
first base model for full-duplex conversational audio
☆1,794Jan 5, 2025Updated last year
leptonai / leptonai
View on GitHub
A Pythonic framework to simplify AI service building
☆2,826Updated this week
letta-ai / letta
View on GitHub
Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.
☆23,903Jul 3, 2026Updated 2 weeks ago
suno-ai / bark
View on GitHub
🔊 Text-Prompted Generative Audio Model
☆39,204Aug 19, 2024Updated last year
andrewnguonly / Lumos
View on GitHub
A RAG LLM co-pilot for browsing the web, powered by local LLMs
☆1,515Jan 26, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,783Aug 16, 2024Updated last year
LargeWorldModel / LWM
View on GitHub
Large World Model -- Modeling Text and Video with Millions Context
☆7,427Oct 19, 2024Updated last year
pipecat-ai / pipecat
View on GitHub
Open Source framework for voice and multimodal conversational AI
☆13,623Updated this week
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,424Nov 19, 2025Updated 8 months ago
argmaxinc / argmax-oss-swift
View on GitHub
On-device Speech AI for Apple Silicon
☆6,280Jul 13, 2026Updated last week
janhq / ichigo
View on GitHub
Local realtime voice AI
☆2,490Nov 26, 2025Updated 7 months ago
ufal / whisper_streaming
View on GitHub
Whisper realtime streaming for long speech-to-text transcription and translation
☆3,652Nov 12, 2025Updated 8 months ago
nilsherzig / LLocalSearch
View on GitHub
LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…
☆5,955Mar 24, 2026Updated 3 months ago
transcriptionstream / transcriptionstream
View on GitHub
turnkey self-hosted offline transcription and diarization service with llm summary
☆944Jan 18, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,956Mar 25, 2026Updated 3 months ago
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
facebookresearch / audio2photoreal
View on GitHub
Code and dataset for photorealistic Codec Avatars driven from audio
☆2,854Sep 15, 2024Updated last year
developersdigest / llm-answer-engine
View on GitHub
Perplexity Inspired Answer Engine
☆5,031Apr 29, 2026Updated 2 months ago
leptonai / search_with_lepton
View on GitHub
Building a quick conversation-based search demo with Lepton AI.
☆8,082Dec 2, 2025Updated 7 months ago
ggml-org / whisper.cpp
View on GitHub
Port of OpenAI's Whisper model in C/C++
☆52,010Jul 11, 2026Updated last week
livekit / agents
View on GitHub
A framework for building realtime voice AI agents 🤖🎙️📹
☆11,456Updated this week