winstxnhdw / CapGenLinks
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
☆10Updated this week
Alternatives and similar repositories for CapGen
Users that are interested in CapGen are comparing it to the libraries listed below
Sorting:
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 7 months ago
- Automatically generate a lip-synced avatar based off of a transcript and audio☆13Updated 2 years ago
- Simple, Unified Repository for Retrieval-based Voice Conversion☆17Updated last year
- The Full-stack web framework to meet the developer's expectation.☆16Updated 2 years ago
- Extract information from XBRL files in the ESEF format☆12Updated last week
- FastAPI backend to upload files to S3☆27Updated 5 years ago
- Modify-Anything is based on yolov5,yolov8 for video and image detection. Segment-anything,lama_cleaner is applied to segment, modify, era…☆15Updated 2 years ago
- Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning …☆26Updated 2 years ago
- Code for the paper "Free-View Expressive Talking Head Video Editing" (ICASSP 2023)☆10Updated last year
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆12Updated 8 months ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- Reading and Writing .WAV files in Python☆19Updated 6 years ago
- Fast and accurate natural language detection. Detector written in Python. Nito-ELD, ELD.☆17Updated last year
- A python library to find differences between audio and transcriptions☆20Updated last year
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆19Updated this week
- DoyenTalker uses deep learning techniques to generate personalized avatar videos that speak user-provided text in a specified voice. The …☆13Updated 11 months ago
- App edit image like mini photoshop using python, pyqt5, deeplearning☆11Updated 2 years ago
- AI Lip Syncing application, deployed on Streamlit☆43Updated last year
- ☆10Updated last year
- simple to use, pretrained/training-less models for speaker diarization☆21Updated 2 years ago
- ☆13Updated 3 years ago
- speaker-disentangled speech linguistic content quantizer☆22Updated 5 months ago
- 🔊😊 A fastapi voice-assistant framework to quickly prototype LLM-powered voice assistants in <5 minutes.☆28Updated last year
- Talking Face Generation system☆19Updated last year
- A chrome extention for quering a local llm model using llama-cpp-python, includes a pip package for running the server, 'pip install loca…☆17Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆67Updated last month
- Engage in conversation with your virtual self using AI techniques like NLP, voice cloning, and computer vision. Get accurate answers with…☆85Updated 2 years ago
- Detecting segments belonging to which song in database, and return Nil if does not exist in a database.☆22Updated 4 years ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models☆66Updated 2 years ago
- Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"☆26Updated 2 years ago