Sally-SH / VSP-LLM
☆294Updated 4 months ago
Related projects: ⓘ
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆395Updated last month
- Official Pytorch implementation of StreamV2V.☆429Updated last week
- [CVPR2024] Make Your Dream A Vlog☆410Updated 6 months ago
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆353Updated last year
- This is the official repository for M2UGen☆439Updated 4 months ago
- ☆166Updated 9 months ago
- ☆161Updated 2 months ago
- Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS☆376Updated 9 months ago
- ☆244Updated 6 months ago
- zero-shot voice conversion with in context learning☆163Updated this week
- Speech Diarization for scrum automation☆94Updated last year
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆171Updated 2 weeks ago
- The Data and Code of Prompt2Sign: The First Comprehensive Multilingual Sign Language Dataset.☆127Updated 2 months ago
- A lightweight end-to-end text-to-speech model☆79Updated last week
- [ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion☆674Updated 2 months ago
- [ICCV 2023] Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation☆263Updated 10 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆335Updated last week
- SEED-Story: Multimodal Long Story Generation with Large Language Model☆695Updated last month
- Official implementation of the paper 'InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation'☆179Updated 3 months ago
- WavJourney: Compositional Audio Creation with LLMs☆513Updated 11 months ago
- Nendo is an open source platform for AI-driven audio management, intelligence, and generation.☆116Updated 6 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆117Updated last month
- [ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models☆614Updated 2 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆118Updated 3 months ago
- Llama3.1 learns to Listen☆134Updated last week
- Add caption to any video☆163Updated 8 months ago
- Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."☆522Updated 3 weeks ago
- Using Claude Opus to reverse engineer code from VASA white paper - WIP - (this is for La Raza 🎷)☆206Updated 3 months ago
- SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.☆322Updated last month
- Multimodal Models in Real World☆372Updated 2 months ago