DragonLiu1995 / multimodal-llm-for-audio-genLinks

Code, Dataset, Samples for the NeurIPS paper “ Tell What You Hear From What You See -- Video to Audio Generation Through Text”

☆8

Alternatives and similar repositories for multimodal-llm-for-audio-gen

Users that are interested in multimodal-llm-for-audio-gen are comparing it to the libraries listed below

Sorting:

ilpoviertola / V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)
☆27Updated 5 months ago
cyanbx / Frieren-V2A
Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)
☆40Updated 2 months ago
Text-to-Audio / Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆98Updated 2 weeks ago
FreedomIntelligence / S2S-Arena
☆16Updated last week
ariesssxu / vta-ldm
☆59Updated 10 months ago
chenqi008 / V2C
Pytorch implementation for “V2C: Visual Voice Cloning”
☆32Updated 2 years ago
soham97 / mellow
small audio language model for reasoning
☆64Updated last month
lavendery / AudioComposer
☆23Updated 8 months ago
AI-S2-Lab / FluentEditor
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆54Updated 7 months ago
yanghaha0908 / EmoVoice
Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
☆31Updated 3 weeks ago
gwx314 / TechSinger
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
☆58Updated last month
justinlovelace / SESD
☆61Updated 7 months ago
walker-hyf / GPT-Talker
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆71Updated 7 months ago
cwang621 / blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
☆45Updated 11 months ago
heng-hw / V2A-Mapper
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆25Updated last year
LAION-AI / emotional-speech-annotations
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆34Updated 7 months ago
snap-research / GenAU
☆33Updated last month
gwh22 / UniVoice
☆50Updated 2 months ago
walker-hyf / NCSSD
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆59Updated 7 months ago
vivian556123 / NeurIPS2024-CoVoMix
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆53Updated 4 months ago
xiquan-li / Awesome-Audio-Generation
Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation
☆28Updated this week
WangHelin1997 / SoloAudio
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆90Updated 5 months ago
KTTRCDL / UMETTS
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
☆30Updated 5 months ago
declare-lab / HyperTTS
☆35Updated last year
PecholaL / IDEAW
Robust Neural Audio Watermarking with Invertible Dual-Embedding
☆21Updated 6 months ago
microsoft / AudioEntailment
Audio Entailment: Deductive Reasoning for Audio Understanding
☆12Updated 5 months ago
qiuk2 / AAR
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆69Updated 9 months ago
0417keito / PromptTTS2
[WIP] Unofficial Implementation of Microsoft's PromptTTS2
☆51Updated last year
youngsheen / GPST
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆58Updated 7 months ago
PeiwenSun2000 / Both-Ears-Wide-Open
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆38Updated 3 weeks ago