winstxnhdw / CapGenLinks
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
☆11Updated this week
Alternatives and similar repositories for CapGen
Users that are interested in CapGen are comparing it to the libraries listed below
Sorting:
- The Full-stack web framework to meet the developer's expectation.☆16Updated 2 years ago
- An awesome list that curates the best Flet tools, tutorials, blogs and more.☆10Updated 3 years ago
- Sample and Computation Redistribution for Efficient Face Detection☆16Updated last year
- Code for ACL 2024 findings paper "wav2vec-S: Adapting Pre-trained Speech Models for Streaming"☆10Updated 9 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Updated last year
- This repository is the project page for "Point Anywhere: Directed Object Estimation from Omnidirectional Images", including source code …☆12Updated 2 years ago
- ☆17Updated 2 years ago
- Simple, Unified Repository for Retrieval-based Voice Conversion☆17Updated last year
- rmp data ranking☆13Updated 3 months ago
- Engage in conversation with your virtual self using AI techniques like NLP, voice cloning, and computer vision. Get accurate answers with…☆84Updated 2 years ago
- 🤖 Quantum-powered excuse generator for developers. Blame bugs on cosmic rays, AI sentience, or Schrödinger’s intern.☆28Updated 5 months ago
- FastAPI backend to upload files to S3☆27Updated 5 years ago
- AI Talking Head: create video from plain text or audio file in minutes, support up to 100+ languages and 350+ voice models.☆37Updated 3 years ago
- [WACV 2026] LASER: Lip Landmark Assisted Speaker Detection for Robustness official implemntation☆20Updated this week
- Extract information from XBRL files in the ESEF format☆13Updated last month
- Convert any image into a Region Adjacency Graph (RAG)☆12Updated 5 years ago
- Reading and Writing .WAV files in Python☆19Updated 6 years ago
- Library for converting from RGB / GrayScale image to base64 and back.☆19Updated 3 years ago
- Transcription and diarization (speaker identification)☆34Updated 2 years ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 3 years ago
- Browser automation for creating new pages in WordPress☆13Updated 8 months ago
- ⚙️ Unified environment variable and settings management for FastAPI and beyond 🚀☆33Updated last week
- Chat Complex PDF with Tables Using IBM WatsonX, Langchain and LlamaParser.☆14Updated 4 months ago
- App edit image like mini photoshop using python, pyqt5, deeplearning☆12Updated 2 years ago
- Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"☆28Updated 2 years ago
- Async KeyDB Python Client☆16Updated 2 years ago
- Automatically generate a lip-synced avatar based off of a transcript and audio☆14Updated 2 years ago
- The Land-Diffuser is a novel application of the Denoising Diffusion Probabilistic Model (DDPM) in the realm of 3D Talking Head generation…☆13Updated 2 years ago
- Faysal-MD / Unmasking-Deepfake-Faces-from-Videos-An-Explainable-Cost-Sensitive-Deep-Learning-Approach-IEEE2023Deepfake faces detection from forged videos where used explainable AI for models' robustness as well as cost sensitive methods for mitiga…☆10Updated last year
- Alternative version of st.camera_input which returns the webcam images live, without any button press needed☆38Updated 6 months ago