themanyone / caption_anything
Caption, translate, and optionally record in real time "what you hear" from speakers and microphone. Never miss part of the conversation again.
☆16Updated 11 months ago
Alternatives and similar repositories for caption_anything:
Users that are interested in caption_anything are comparing it to the libraries listed below
- llmon-py is a multimodal webui for Llama 3-8B.☆16Updated 7 months ago
- Convert your PDFs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient…☆44Updated last week
- Speak (speech-to-text) to LLMs (Ollama) in any lanaguage - Streamlit app☆40Updated 11 months ago
- Sophia AI Assistant is a Python-based desktop AI that performs a variety of tasks, including answering questions, opening applications, b…☆13Updated 4 months ago
- Self hosted high quality voice recognition for de-googled Android using whisper. Like Siri or OK Google.☆60Updated last year
- a simple system for 2-way interruptible voice interactions between human and LLM☆22Updated last year
- Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma☆32Updated 4 months ago
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆16Updated this week
- ☆24Updated last year
- A VoiceAsistant with WhisperAI speech recognition☆29Updated 2 months ago
- A lightweight Python library for running TTS models with a unified API.☆16Updated this week
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆49Updated 2 months ago
- Generates ChatGPT/BingChat & GPT-4 prompts using this model trained by Kaludi. Enter a role and a prompt will be generated based on it.☆26Updated last year
- streaming speech to text server using Whisper☆86Updated last year
- Webinterface for administrating Ollama and model Quantization with public endpoints and automized OPENAI proxy☆51Updated 9 months ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- this master thesis project is based on OpenAI Whisper with the goal to transcibe interviews☆47Updated 6 months ago
- Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.☆40Updated this week
- Local & private voice controlled notepad using whisper.cpp☆23Updated last year
- OpenAI-Assistant API integration with Speech Recognition and Eleven Labs TTS. User can choose name, description, model of assistant and …☆18Updated last year
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆92Updated 9 months ago
- A tiny server to run local inference on MLX model in the style of OpenAI☆12Updated last year
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- This is a Raspberry Pi 5 whisper C++ voice assistant - backwards compatible with Pi4☆19Updated last year
- LLM Chat is an open-source serverless alternative to ChatGPT.☆31Updated 5 months ago
- WIP exploration using Twilio Media Streams and Generative AI☆39Updated last year
- ☆34Updated 4 months ago
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆53Updated 4 months ago
- Pybind11 bindings for Whisper.cpp☆50Updated 2 weeks ago
- A project that brings the power of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) within reach of everyone, particu…☆34Updated last year