snap-research/GenAU

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/snap-research/GenAU)

snap-research / GenAU

☆53

Alternatives and similar repositories for GenAU

Users that are interested in GenAU are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
soham97 / ADIFF
View on GitHub
Explaining audio differences using language
☆16Feb 11, 2025Updated last year
snap-research / AVLink
View on GitHub
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
☆17Aug 3, 2025Updated 11 months ago
uvavision / SelfEQ
View on GitHub
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
☆28Mar 1, 2024Updated 2 years ago
snap-research / elit
View on GitHub
☆37Mar 24, 2026Updated 4 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
microsoft / NoAudioCaptioning
View on GitHub
Repository for "Training Audio Captioning Models without Audio"
☆10Sep 26, 2023Updated 2 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
audio-captioning / caption-evaluation-tools
View on GitHub
Tools for the evaluation of audio captioning.
☆19May 23, 2020Updated 6 years ago
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆79Feb 14, 2026Updated 5 months ago
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
MoayedHajiAli / ElasticDiffusion-official
View on GitHub
The official Pytorch Implementation for ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Sepa…
☆160Dec 24, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NazirNayal8 / UEM-likelihood-ratio
View on GitHub
Official Code for "A Likelihood Ratio-Based Approach to Segmenting Unknown Objects" [IJCV 2025]
☆15Jun 9, 2025Updated last year
DragonLiu1995 / video-to-audio-through-text
View on GitHub
[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”
☆38Jul 24, 2025Updated 11 months ago
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
yuhanghe01 / RiTTA
View on GitHub
Event Relation in Text-to-Audio (TTA) Generation
☆21Feb 26, 2025Updated last year
zeyuxie29 / PicoAudio
View on GitHub
☆45Jan 13, 2025Updated last year
wsntxxn / UniFlow-Audio
View on GitHub
☆73Jul 17, 2026Updated last week
PolyPerceiver-Lab / STAV2A
View on GitHub
☆20Aug 11, 2025Updated 11 months ago
Jiang-Yidi / UniCodec
View on GitHub
[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…
☆156May 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WangHelin1997 / SoloAudio
View on GitHub
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆119Jan 28, 2026Updated 5 months ago
gwh22 / LAFMA
View on GitHub
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆44Jun 13, 2024Updated 2 years ago
Labbeti / conette-audio-captioning
View on GitHub
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
☆23Dec 17, 2025Updated 7 months ago
facebookresearch / MetaEmbed
View on GitHub
[ICLR 2026 Oral] Official Implementation of the paper "MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactio…
☆18Jul 2, 2026Updated 3 weeks ago
ETH-DISCO / sao-instruct
View on GitHub
Official repo for SAO-Instruct: Free-form Audio Editing using Natural Language Instructions presented at NeurIPS 2025
☆18Oct 28, 2025Updated 8 months ago
youngsheen / GPST
View on GitHub
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆70Nov 1, 2024Updated last year
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
sony / soundctm
View on GitHub
Pytorch implementation of SoundCTM
☆101Mar 31, 2025Updated last year
LoieSun / Auto-ACD
View on GitHub
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Sep 18, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Ming-er / Audio-Free-P-Tuning
View on GitHub
☆11Dec 28, 2023Updated 2 years ago
happylittlecat2333 / Auffusion
View on GitHub
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…
☆194Mar 25, 2024Updated 2 years ago
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
Text-to-Audio / Make-An-Audio-3
View on GitHub
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆121May 19, 2025Updated last year
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 7 months ago