themanyone / caption_anythingLinks

Caption, translate, and optionally record in real time "what you hear" from speakers and microphone. Never miss part of the conversation again.

☆19

Alternatives and similar repositories for caption_anything

Users that are interested in caption_anything are comparing it to the libraries listed below

Sorting:

3eeps / llmon-py
llmon-py is a multimodal webui for Llama 3-8B.
☆16Updated last year
VideotronicMaker / LM-Studio-Voice-Conversation
Python app for LM Studio-enhanced voice conversations with local LLMs. Uses Whisper for speech-to-text and offers a privacy-focused, acce…
☆102Updated last year
themanyone / whisper_dictation
Private voice keyboard, AI chat, images, webcam, recordings, voice control with >= 4 GiB of VRAM.
☆256Updated last month
ancs21 / awesome-openai-whisper
A curated list of awesome OpenAI's Whisper
☆101Updated last year
OpenVoiceOS / ovos-tts-plugin-piper
☆27Updated 2 weeks ago
revdotcom / reverb-self-hosted
This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.
☆53Updated 7 months ago
Picovoice / orca
On-device streaming text-to-speech engine powered by deep learning
☆98Updated this week
projecte-aina / oTranscribe-plus
A free & open tool for transcribing audio interviews with offline ASR support
☆24Updated last year
solarsamuel / pi5_whisper_voice_assistant
This is a Raspberry Pi 5 whisper C++ voice assistant - backwards compatible with Pi4
☆24Updated last year
tsmdt / whisply
💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!
☆56Updated last month
rpdrewes / whisper-websocket-server
Self hosted high quality voice recognition for de-googled Android using whisper. Like Siri or OK Google.
☆64Updated last year
l33tkr3w / LlamaCards
LlamaCards is a web application that provides a dynamic interface for interacting with LLM models in real-time. This app allows users to …
☆39Updated 10 months ago
nhaouari / local11labs
Local11Labs allows generating high-quality text-to-speech and podcast content using the fast and tiny Kokoro-82M.
☆47Updated 6 months ago
BBC-Esq / Faster-Whisper-Transcriber
Record audio and save a transcription to your system's clipboard with ctranslate2 and faster-whisper.
☆133Updated this week
fakerybakery / simpletts
A lightweight Python library for running TTS models with a unified API.
☆20Updated 5 months ago
Meeting-BaaS / transcript-seeker
An open-source, browser-based transcript viewer and manager. Upload, transcribe, and chat with meeting recordings using AI. Features meet…
☆55Updated 2 months ago
agituts / gemini-2-tts
AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI A…
☆40Updated 7 months ago
rajpdus / MeetingSummarizer
Effortlessly record, transcribe, and summarize meetings with this user-friendly desktop utility powered by OpenAI's Whisper and GPT-3.5-t…
☆186Updated 2 years ago
BBC-Esq / Poor-Man-Vector-Database
Simple GUI to load a PDF/Docx/txt file and have LM Studio Answer based off of it.
☆14Updated 11 months ago
Luxadevi / Ollama-Companion
Webinterface for administrating Ollama and model Quantization with public endpoints and automized OPENAI proxy
☆50Updated 4 months ago
mikeesto / gemini-transcribe
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
☆33Updated 3 weeks ago
rudymohammadbali / Whisper-Transcriber
Modern Desktop Application offering a suite of tools for audio/video text recognition and a variety of other useful utilities.
☆56Updated 11 months ago
dkjroot / iris-llm
IRIS: Demonstrator for use of LLMs in python (outdated)
☆62Updated 4 months ago
yohasebe / whisper-stream
A bash script using OpenAI Whisper API for continuous audio transcription with automatic silence detection
☆111Updated last year
dariox1337 / whisper-writer
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
☆10Updated 10 months ago
MikaSchultes / GPTAssistant-ElevenLabs
OpenAI-Assistant API integration with Speech Recognition and Eleven Labs TTS. User can choose name, description, model of assistant and …
☆18Updated last year
linto-ai / linto-diarization
Speaker diarization service
☆23Updated last month
unicornlaunching / langchain-and-elevenlabs-with-pdf-analysis
How might we mix OpenAI and Langchain and ElevenLabs to speak out responses to prompts using a body of knowledge encapsulated in PDFs?
☆38Updated 2 years ago
NeonGeckoCom / neon-tts-plugin-coqui
Coqui AI TTS plugin
☆85Updated 3 weeks ago
gladiaio / gladia-samples
☆55Updated 2 weeks ago