mikeesto / gemini-transcribeLinks
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
☆51Updated 3 weeks ago
Alternatives and similar repositories for gemini-transcribe
Users that are interested in gemini-transcribe are comparing it to the libraries listed below
Sorting:
- A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!☆105Updated 2 months ago
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆158Updated 2 weeks ago
- Local11Labs allows generating high-quality text-to-speech and podcast content using the fast and tiny Kokoro-82M.☆48Updated last year
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆53Updated last year
- a simple system for 2-way interruptible voice interactions between human and LLM☆30Updated last year
- speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with…☆249Updated 5 months ago
- Audiobook creation tool with support for multiple TTS models (MiraTTS, GLM-TTS, IndexTTS2, VibeVoice, Higgs V2, Fish S1-mini, Chatterbox,…☆66Updated this week
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆100Updated last year
- Very fast, accurate speaker diarization☆222Updated 3 weeks ago
- ez audio transcription tool with flexible processing and post-processing options☆162Updated last year
- ☆43Updated last year
- This project includes a Python script for fine-tuning a text-to-speech (TTS) model. The script utilizes custom datasets and use CUDA for …☆13Updated last year
- Open TTS models, built for streaming on the edge☆44Updated 10 months ago
- Add real-time Speech-to-Text to your LiveKit application with AssemblyAI☆18Updated 7 months ago
- Web Interface for Vision Language Models Including InternVLM2☆25Updated last year
- ☆54Updated 8 months ago
- web based editor for subtitles and transcripts☆143Updated last year
- Open source Python program for automating gain staging. part 1 of a series for automating audio processing tasks, end goal is to create a…☆46Updated 2 years ago
- Efficient approach to speaker diarization using voice characteristics extraction☆106Updated 7 months ago
- Roomey is a multi-purpose Voice Agent designed to run your personal and business life.☆60Updated 7 months ago
- ☆345Updated 5 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆219Updated 9 months ago
- Automatically generate engaging AI podcasts from nothing but an episode title.☆142Updated 6 months ago
- Voice agent using LiveKit (orchestration), Cartesia (TTS), OpenAI (LLM), and Deepgram (STT)☆20Updated 3 months ago
- An open-source, browser-based transcript viewer and manager. Upload, transcribe, and chat with meeting recordings using AI. Features meet…☆64Updated 8 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆70Updated 3 months ago
- 💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisp…☆85Updated 2 weeks ago
- Record audio and save a transcription to your system's clipboard with ctranslate2 and faster-whisper.☆168Updated last week
- Faster Whisper ASR transcription with CTranslate2☆24Updated last year
- ☆54Updated last week