Sally-SH / VSP-LLMLinks
☆325Updated 3 months ago
Alternatives and similar repositories for VSP-LLM
Users that are interested in VSP-LLM are comparing it to the libraries listed below
Sorting:
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆384Updated last year
- ☆161Updated 7 months ago
- Official Pytorch implementation of StreamV2V.☆499Updated 4 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆232Updated this week
- ☆180Updated 3 weeks ago
- ☆258Updated last year
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆912Updated 8 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆752Updated 3 weeks ago
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆601Updated 11 months ago
- This is the official repository for M2UGen☆495Updated 5 months ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆325Updated last week
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated 7 months ago
- Speech Diarization for scrum automation☆107Updated last year
- Kyutai with an "eye"☆201Updated 3 months ago
- A lightweight end-to-end text-to-speech model☆114Updated 4 months ago
- [CVPR2024] Make Your Dream A Vlog☆425Updated last month
- Official repository of "TryOffAnyone: Tiled Cloth Generation from a Dressed Person"☆175Updated 4 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆243Updated this week
- [ICML 2025] Official PyTorch implementation of LongVU☆382Updated last month
- ☆488Updated last week
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆837Updated 11 months ago
- MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion☆218Updated 2 months ago
- ☆411Updated last month
- Using Claude Sonnet 3.5 to forward (reverse) engineer code from VASA white paper - WIP - (this is for La Raza 🎷)☆294Updated 7 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆166Updated last year
- A toolkit for speaker diarization.☆209Updated 3 weeks ago
- Pytorch Code for "BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis"☆300Updated 6 months ago
- ☆432Updated last month
- Official Implementation of 'E4S: Fine-grained Face Swapping via Editing With Regional GAN Inversion'☆141Updated 11 months ago
- Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis☆358Updated 5 months ago