Sally-SH / VSP-LLM
☆300Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for VSP-LLM
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆468Updated 3 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆97Updated this week
- Interface for OuteTTS models.☆409Updated 2 weeks ago
- A toolkit for speaker diarization.☆146Updated last week
- ☆282Updated 2 weeks ago
- Open source inference code for Rev's model☆335Updated last week
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆360Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆237Updated last week
- ☆166Updated 4 months ago
- ☆171Updated 11 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆127Updated 3 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆127Updated 5 months ago
- This is the official repository for M2UGen☆448Updated 6 months ago
- [CVPR2024] Make Your Dream A Vlog☆416Updated 8 months ago
- Official Pytorch implementation of StreamV2V.☆451Updated 2 months ago
- Bring portraits to life via Monitor!☆256Updated 3 months ago
- A lightweight end-to-end text-to-speech model☆91Updated 2 months ago
- Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models☆216Updated this week
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆813Updated 4 months ago
- Multimodal Models in Real World☆404Updated 3 weeks ago
- ☆254Updated 8 months ago
- Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen☆334Updated last month
- Nendo is an open source platform for AI-driven audio management, intelligence, and generation.☆117Updated 8 months ago
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆379Updated 11 months ago
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆56Updated this week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆781Updated 3 weeks ago
- SEED-Story: Multimodal Long Story Generation with Large Language Model☆751Updated last month
- ☆356Updated 5 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆222Updated 2 months ago
- Using Claude Sonnet 3.5 to forward (reverse) engineer code from VASA white paper - WIP - (this is for La Raza 🎷)☆229Updated last week