Sally-SH / VSP-LLMLinks
☆341Updated 10 months ago
Alternatives and similar repositories for VSP-LLM
Users that are interested in VSP-LLM are comparing it to the libraries listed below
Sorting:
- ☆167Updated last year
- ☆186Updated 6 months ago
- This is the official repository for M2UGen☆511Updated last year
- Speech Diarization for scrum automation☆111Updated 2 years ago
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆401Updated 2 years ago
- A lightweight end-to-end text-to-speech model☆126Updated 11 months ago
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆381Updated 2 years ago
- ☆175Updated 2 years ago
- The project page of Diffutoon☆28Updated 2 years ago
- ☆204Updated last year
- [CVPR2024] Make Your Dream A Vlog☆432Updated 8 months ago
- GPT-4o-level, real-time spoken dialogue system.☆369Updated last year
- Open source inference code for Rev's model☆435Updated 9 months ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆329Updated last month
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated last year
- The Data and Code of Prompt2Sign: A Comprehensive Multilingual Sign Language Dataset.☆210Updated 5 months ago
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆72Updated 5 months ago
- ☆258Updated last year
- Official Pytorch implementation of StreamV2V.☆532Updated last month
- [IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆644Updated last year
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆418Updated 3 months ago
- ☆222Updated last year
- A toolkit for speaker diarization.☆392Updated this week
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆79Updated 7 months ago
- wip - running some training with overfitting - https://wandb.ai/snoozie/vasa-overfitting☆310Updated 2 weeks ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆376Updated 2 weeks ago
- ☆486Updated 9 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆187Updated last year
- We Speech Transcript based on LLM, in 300 lines of code.☆183Updated 7 months ago
- Official Implementation of "Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models"☆18Updated 2 years ago