glowinthedark / subtitles-ocr
Hard-burned subtitles OCR to SRT extractor
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for subtitles-ocr
- ☆10Updated 2 months ago
- Triton backend for https://github.com/OpenNMT/CTranslate2☆9Updated 3 months ago
- This is a TTS model based on VITS that can control the output speech emotion through natural language and control the speaker through ref…☆4Updated 3 months ago
- This repository is the implementation of the paper, "Score-balanced Loss for Multi-aspect Pronunciation Assessment" (Interspeech 2023).☆15Updated 6 months ago
- Transferability of cross-lingual and cross-age speech emotion recognition☆17Updated last year
- The case study and multilingfual performance of ICASSP submission☆19Updated 2 years ago
- AsoSoft Speech Corpus for Central-Kurdish Text-To-Speech☆13Updated 2 years ago
- Whisper fine-tuning event script to use multiple hf datasets☆32Updated last year
- 'Grad-TTS' with Multilingual Cleaners☆10Updated 7 months ago
- ☆25Updated 2 years ago
- real time japanese speech recognition translator using wav2vec2☆33Updated 2 years ago
- Convert English text from written expressions into spoken forms☆21Updated 2 years ago
- Open Source Speech/Text Data on AI☆18Updated 2 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated last year
- A pipeline to isolate and transcribe one language in mixed-language speech☆18Updated 2 years ago
- OpenAI Whisper Prompt Examples☆48Updated last year
- Robust Speech Recognition via Large-Scale Weak Supervision☆30Updated 11 months ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models☆65Updated 2 years ago
- Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recogni…☆24Updated 3 years ago
- Word Error Rate Estimation☆10Updated 4 years ago
- wav2vec2 asr with transformers☆14Updated 3 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 9 months ago
- A handy dataset of noises for ASR☆19Updated 5 years ago
- Goodness of Pronunciation algorithm using PyKaldi☆14Updated 2 years ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆22Updated 3 months ago
- Zero-Shot Foreign Accent Conversion without a Native Reference☆28Updated 6 months ago
- Text to speech is an emerging zone of AI. This repository helps to create a dataset with audio and transcripts for personalized text to s…☆27Updated last year
- ☆10Updated last year
- ☆30Updated 3 years ago
- Vocal Synthesis Through MIDI and Vocal Transformation Using RVC (KO, EN, JA, ZH)☆22Updated last year