themanyone / caption_anything
Caption, translate, and optionally record in real time "what you hear" from speakers and microphone. Never miss part of the conversation again.
☆16Updated last year
Alternatives and similar repositories for caption_anything:
Users that are interested in caption_anything are comparing it to the libraries listed below
- llmon-py is a multimodal webui for Llama 3-8B.☆16Updated 9 months ago
- This is a Raspberry Pi 5 whisper C++ voice assistant - backwards compatible with Pi4☆21Updated last year
- a simple system for 2-way interruptible voice interactions between human and LLM☆25Updated last year
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆53Updated 4 months ago
- IRIS: Demonstrator for use of LLMs in python (outdated)☆62Updated last month
- Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma☆33Updated 6 months ago
- Realtime vision and voice controlled AI assistant☆16Updated last week
- Self hosted high quality voice recognition for de-googled Android using whisper. Like Siri or OK Google.☆63Updated last year
- LlamaCards is a web application that provides a dynamic interface for interacting with LLM models in real-time. This app allows users to …☆38Updated 7 months ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆65Updated 3 weeks ago
- Webinterface for administrating Ollama and model Quantization with public endpoints and automized OPENAI proxy☆50Updated last month
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 5 months ago
- A project that brings the power of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) within reach of everyone, particu…☆34Updated last year
- My version of an LLM Websearch Agent using a local SearXNG server because SearXNG is great.☆30Updated last month
- Speak (speech-to-text) to LLMs (Ollama) in any lanaguage - Streamlit app☆43Updated last year
- ☆14Updated 2 months ago
- ☆37Updated last year
- Uses a Gradio interface to stream coding related responses from local and cloud based large language models. Pulls context from GitHub Re…☆21Updated last month
- ☆17Updated 4 months ago
- ☆13Updated last month
- Experimental sampler to make LLMs more creative☆30Updated last year
- Using FastChat-T5 Large Language Model, Vosk API for automatic speech recognition, and Piper for text-to-speech☆118Updated last year
- Voice assistant with audio input and audio output using Whisper and Eleven Labs☆11Updated this week
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆26Updated 3 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆33Updated 9 months ago
- Little AI roleplay program☆57Updated last year
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆19Updated 6 months ago
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆18Updated last week
- An Extension for oobabooga/text-generation-webui☆36Updated last year