themanyone / caption_anything
Caption, translate, and optionally record in real time "what you hear" from speakers and microphone. Never miss part of the conversation again.
☆17Updated last week
Alternatives and similar repositories for caption_anything
Users that are interested in caption_anything are comparing it to the libraries listed below
Sorting:
- llmon-py is a multimodal webui for Llama 3-8B.☆16Updated 10 months ago
- IRIS: Demonstrator for use of LLMs in python (outdated)☆62Updated last month
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆73Updated last month
- Little AI roleplay program☆58Updated last year
- This is a Raspberry Pi 5 whisper C++ voice assistant - backwards compatible with Pi4☆21Updated last year
- Self hosted high quality voice recognition for de-googled Android using whisper. Like Siri or OK Google.☆64Updated last year
- OpenAI-Assistant API integration with Speech Recognition and Eleven Labs TTS. User can choose name, description, model of assistant and …☆18Updated last year
- On-device streaming text-to-speech engine powered by deep learning☆79Updated last week
- Like ChatGPT's voice conversations with an AI, but entirely offline/private/trade-secret-friendly, using local AI models such as LLama 2 …☆158Updated 8 months ago
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆18Updated 3 weeks ago
- a simple system for 2-way interruptible voice interactions between human and LLM☆29Updated last year
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆52Updated 5 months ago
- Text generation in Python, as easy as possible☆60Updated this week
- On-device noise suppression powered by deep learning☆69Updated last week
- Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs☆28Updated 3 weeks ago
- A cutting-edge Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.☆87Updated last week
- Private voice keyboard, AI chat, images, webcam, recordings, voice control with >= 4 GiB of VRAM.☆230Updated this week
- Speak (speech-to-text) to LLMs (Ollama) in any lanaguage - Streamlit app☆43Updated last year
- ☆16Updated last week
- ☆18Updated 2 years ago
- Recipes for on-device voice AI and local LLM☆82Updated this week
- ☆22Updated 9 months ago
- ☆12Updated last year
- Get started using Deepgram's Live Transcription with this Flask demo app☆33Updated this week
- Allows two LLMs to communicate and run code in the terminal☆24Updated 5 months ago
- ☆14Updated 3 months ago
- Multimodal AI App using Llava 7B and Gradio.☆38Updated last year
- ☆13Updated 2 months ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- Speaker diarization service☆22Updated last month