Sally-SH / VSP-LLMLinks
☆327Updated 4 months ago
Alternatives and similar repositories for VSP-LLM
Users that are interested in VSP-LLM are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆242Updated last week
- ☆165Updated 7 months ago
- This is the official repository for M2UGen☆496Updated 6 months ago
- ☆181Updated last month
- A lightweight end-to-end text-to-speech model☆116Updated 5 months ago
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆382Updated last year
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆386Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆304Updated 3 weeks ago
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆611Updated last year
- Speech Diarization for scrum automation☆108Updated last year
- Official Pytorch implementation of StreamV2V.☆504Updated 5 months ago
- [CVPR2024] Make Your Dream A Vlog☆427Updated 2 months ago
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆64Updated 2 months ago
- ☆175Updated last year
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆249Updated this week
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆248Updated 3 weeks ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated 8 months ago
- ☆443Updated 2 months ago
- ☆508Updated last month
- The project page of Diffutoon☆26Updated last year
- ☆429Updated 2 months ago
- Open source inference code for Rev's model☆414Updated 3 months ago
- A toolkit for speaker diarization.☆231Updated last month
- ☆193Updated last year
- Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"☆73Updated 3 months ago
- Kyutai with an "eye"☆211Updated 4 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆168Updated last year
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆342Updated last month
- ☆260Updated last year
- Incredibly descriptive audiovisual summaries for videos☆41Updated 11 months ago