BBC-Esq / WhisperS2T-transcriber
Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files
☆17Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for WhisperS2T-transcriber
- Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning …☆24Updated last year
- Easy tool that splits given audio based on speaker.☆11Updated 10 months ago
- ☆9Updated last month
- Real-time end-to-end singing voice convertion☆18Updated 2 weeks ago
- Heteronym to Phoneme Parser☆15Updated last year
- Uses deepgram/whisper/custom models to create an LJSpeech dataset for voice model fine tuning☆12Updated 2 weeks ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆40Updated 3 weeks ago
- ☆20Updated 3 weeks ago
- Text-to-Music Generation with Rectified Flow Transformer☆48Updated 2 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆45Updated 2 weeks ago
- Running the F5-TTS by ONNX Runtime☆35Updated this week
- AudioLDM text to audio colab☆19Updated last year
- This project includes a Python script for fine-tuning a text-to-speech (TTS) model. The script utilizes custom datasets and use CUDA for …☆13Updated last month
- Robust Speech Recognition via Large-Scale Weak Supervision☆30Updated 11 months ago
- Versatile AI-driven audio upscaler to enhance the quality of any audio.☆60Updated 2 months ago
- ☆77Updated 4 months ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models☆65Updated 2 years ago
- ☆61Updated 3 months ago
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆23Updated last month
- llmon-py is a multimodal webui for Llama 3-8B.☆15Updated 4 months ago
- Speech enhancement in noisy and reverberant environments using deep neural networks☆15Updated last month
- ☆87Updated 6 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆50Updated 3 years ago
- Production-ready vocoder using BigVSAN☆11Updated 9 months ago
- Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger.☆32Updated last year
- This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the …☆32Updated last week
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆84Updated last month