Sally-SH / VSP-LLMLinks
☆334Updated 7 months ago
Alternatives and similar repositories for VSP-LLM
Users that are interested in VSP-LLM are comparing it to the libraries listed below
Sorting:
- This is the official repository for M2UGen☆503Updated 10 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆276Updated 4 months ago
- Speech Diarization for scrum automation☆111Updated 2 years ago
- [IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆630Updated last year
- ☆183Updated 3 months ago
- A lightweight end-to-end text-to-speech model☆123Updated 8 months ago
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆382Updated last year
- ☆166Updated 11 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆299Updated last week
- Official Pytorch implementation of StreamV2V.☆514Updated 8 months ago
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆394Updated 2 years ago
- ☆525Updated last month
- ☆466Updated 5 months ago
- ☆174Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆314Updated 3 weeks ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆794Updated 3 months ago
- ☆467Updated 5 months ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆366Updated 2 months ago
- [CVPR2024] Make Your Dream A Vlog☆428Updated 5 months ago
- The Data and Code of Prompt2Sign: A Comprehensive Multilingual Sign Language Dataset.☆196Updated 2 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated 11 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆178Updated last year
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆317Updated last month
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆220Updated 6 months ago
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆297Updated last week
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆624Updated 6 months ago
- Nendo is an open source platform for AI-driven audio management, intelligence, and generation.☆128Updated last year
- GPT-4o-level, real-time spoken dialogue system.☆359Updated 9 months ago
- Kyutai with an "eye"☆223Updated 7 months ago
- The project page of Diffutoon☆27Updated last year