Sally-SH / VSP-LLMLinks
☆331Updated 5 months ago
Alternatives and similar repositories for VSP-LLM
Users that are interested in VSP-LLM are comparing it to the libraries listed below
Sorting:
- ☆166Updated 9 months ago
- A lightweight end-to-end text-to-speech model☆119Updated 6 months ago
- Speech Diarization for scrum automation☆111Updated 2 years ago
- This is the official repository for M2UGen☆500Updated 8 months ago
- The Data and Code of Prompt2Sign: A Comprehensive Multilingual Sign Language Dataset.☆183Updated 3 weeks ago
- ☆175Updated last year
- ☆183Updated last month
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆384Updated last year
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆387Updated 2 years ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆308Updated 2 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆264Updated this week
- ☆193Updated last year
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆623Updated last year
- [CVPR2024] Make Your Dream A Vlog☆427Updated 3 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆129Updated 9 months ago
- Official Pytorch implementation of StreamV2V.☆510Updated 7 months ago
- Open source inference code for Rev's model☆428Updated 4 months ago
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆67Updated 3 weeks ago
- ☆445Updated 4 months ago
- ☆222Updated last year
- ☆262Updated last year
- A toolkit for speaker diarization.☆283Updated last week
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆359Updated last month
- Have a natural voice conversation with an LLM☆254Updated 9 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆611Updated 5 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆258Updated 2 months ago
- ☆457Updated 3 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆781Updated last month
- Incredibly descriptive audiovisual summaries for videos☆41Updated last year
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆918Updated 10 months ago