fixie-ai / ultravox
A fast multimodal LLM for real-time voice
β2,760Updated this week
Alternatives and similar repositories for ultravox:
Users that are interested in ultravox are comparing it to the libraries listed below
- first base model for full-duplex conversational audioβ1,669Updated last week
- Local realtime voice AIβ2,162Updated this week
- Build real-time multimodal AI applications π€ποΈπΉβ4,588Updated this week
- Open Source framework for voice and multimodal conversational AIβ4,299Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-oβ3,679Updated last month
- β7,156Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β2,746Updated 2 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devicesβ2,492Updated this week
- Inference and training library for high-quality TTS models.β4,910Updated last month
- Convert any PDF into a podcast episode!β1,816Updated last month
- Whisper with Medusa headsβ818Updated 2 weeks ago
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.β1,570Updated 5 months ago
- A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech modeβ¦β866Updated 2 months ago
- Interface for OuteTTS models.β859Updated this week
- PraisonAI is an AI Agents Framework with Self Reflection. PraisonAI application combines PraisonAI Agents, AutoGen, and CrewAI into a lowβ¦β3,009Updated this week
- π₯ Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wiβ¦β2,898Updated last week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundryβ3,518Updated 2 weeks ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documentsβ3,346Updated this week
- Everything about the SmolLM & SmolLM2 family of modelsβ1,554Updated last week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.β3,055Updated this week
- Flexible and powerful framework for managing multiple AI agents and handling complex conversationsβ3,835Updated this week
- An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Coβ¦β2,413Updated last week
- π¦ CHONK your texts with Chonkie β¨ - The no-nonsense RAG chunking libraryβ2,249Updated this week
- Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitchingβ632Updated this week
- Local SRT/LLM/TTS Voicechatβ590Updated 3 months ago
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other entβ¦β2,022Updated this week
- Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet. Powered by Vercelβ¦β3,891Updated this week
- File Parser optimised for LLM Ingestion with no loss π§ Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.β4,966Updated this week
- Full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlitβ1,375Updated last week
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidβ¦β2,159Updated this week