groundcat / Google-AI-video-transcribe-subtitle-generatorLinks

Transcribes video using GCP speech-to-text and generates .SRT subtitles

☆16

Alternatives and similar repositories for Google-AI-video-transcribe-subtitle-generator

Users that are interested in Google-AI-video-transcribe-subtitle-generator are comparing it to the libraries listed below

Sorting:

flozi00 / atra
An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …
☆20Updated 8 months ago
lukerbs / forcealign
ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level…
☆15Updated 6 months ago
mayukhnair / deepspeech-colab
Running Mozilla's implementation of Baidu DeepSpeech on Google Colaboratory
☆16Updated 6 years ago
ryanrudes / YTTTS
The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions
☆51Updated 4 years ago
HLTCHKUST / elderly_ser
Transferability of cross-lingual and cross-age speech emotion recognition
☆18Updated last year
bryan-brancotte / subtitle_to_speech
convert subtitle (.srt) to speech (.wav) using google API
☆42Updated 3 years ago
zilliz-bootcamp / audio_search
This project use PANNs for audio tagging and sound event detection, and finally get audio embeddings. Then Milvus is used to search the s…
☆24Updated 3 years ago
kavishgambhir / xy-cut-tree
Segmenting a given document using recursive xy-cut algorithm.
☆12Updated 6 years ago
daanzu / wav2vec2_stt_python
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recogni…
☆24Updated 3 years ago
Fcabla / whisper_subtitler
Generate transcriptions and subtitles using OpenAI whisper as a base model, stable-ts/whisperx as a timestamp stabilizer using ASR models…
☆18Updated 2 years ago
Yaoming95 / UniPunc
The case study and multilingfual performance of ICASSP submission
☆24Updated 2 years ago
muhammad-ahmed-ghani / svoice_demo
A PyTorch demo of the paper Voice Separation with an Unknown Number of Multiple Speakers using gradio and Nvidia NEMO ASR model.
☆36Updated last year
farisalasmary / deepspeech2-online-decoder
Online (real-time) decoder to be used with DeepSpeech2 model
☆25Updated 5 years ago
lucasjinreal / aural
A Tiny Project For ASR model training and Deployment
☆27Updated 2 years ago
ex3ndr / supervoice-separate
Supervoice Speaker Separation Network
☆12Updated last year
MiniXC / LightningFastSpeech2
☆56Updated 2 years ago
igormq / ctcdecode-pytorch
Python implementation of CTC beam search decoder + agnostic LM scorer
☆19Updated 4 years ago
xingmegshuo / zhrtvc
chinese real time voice cloning
☆38Updated 5 years ago
DCGM / SoftCTC
This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135
☆19Updated 2 years ago
rash1993 / movie-asd
repo for active speaker detection for media videos.
☆27Updated last year
leonardltk / Shazam-An-Industrial-Strength-Audio-Search-Algorithm-
Detecting segments belonging to which song in database, and return Nil if does not exist in a database.
☆21Updated 4 years ago
sshh12 / Conv-VAD
A packaged convolutional voice activity detector for noisy environments.
☆14Updated 5 years ago
pyannote / hf-speaker-diarization-3.1
Mirror of hf.co/pyannote/speaker-diarization-3.1
☆23Updated last year
fmiotello / fastVC
A simple voice conversion tool
☆17Updated 3 years ago
EliasVincent / whisper-subtitles-webui
A gradio interface for making transcribed and translated subtitles for videos
☆40Updated 3 months ago
didi / MeetDot
☆11Updated 3 years ago
doerlbh / MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
☆27Updated 3 years ago
cronrpc / Audio-Speaker-Needle-In-Haystack
Finding the most similar tone/color in a large collection of audio. 在一大堆音频中寻找最相似的音色。
☆13Updated 11 months ago
iwater / Real-Time-Voice-Cloning-Chinese
Clone a voice in 5 seconds to generate arbitrary speech in real-time
☆34Updated 5 years ago
daanzu / wenet_stt_python
☆33Updated 3 years ago