hkchengrex/av-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkchengrex/av-benchmark)

hkchengrex / av-benchmark

Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync

☆80

Alternatives and similar repositories for av-benchmark

Users that are interested in av-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆145Sep 2, 2025Updated 10 months ago
ETH-DISCO / sao-instruct
View on GitHub
Official repo for SAO-Instruct: Free-form Audio Editing using Natural Language Instructions presented at NeurIPS 2025
☆18Oct 28, 2025Updated 9 months ago
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
xiquan-li / Awesome-Audio-Generation
View on GitHub
Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation
☆75Jul 20, 2026Updated last week
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 4 months ago
ZYH-Lightyear / LVAS
View on GitHub
LVAS-Agent Code Base
☆21Apr 15, 2025Updated last year
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
wsntxxn / UniFlow-Audio
View on GitHub
☆74Jul 17, 2026Updated last week
facebookresearch / audiobox-aesthetics
View on GitHub
Unified automatic quality assessment for speech, music, and sound.
☆747Jun 5, 2025Updated last year
Xiaohao-Liu / Awesome-Vison2Audio
View on GitHub
A curated list of Vision (video/image) to Audio Generation
☆107Feb 10, 2026Updated 5 months ago
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
cyanbx / Frieren-V2A
View on GitHub
Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)
☆63Apr 3, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lizhaoqing / UNISON
View on GitHub
☆43Jun 3, 2026Updated last month
wuzhiyue111 / Codec-Evaluation
View on GitHub
☆50Apr 5, 2026Updated 3 months ago
sony / mmaudiosep
View on GitHub
☆16Apr 30, 2026Updated 2 months ago
jaeyeonkim99 / visage
View on GitHub
Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)
☆47Sep 10, 2025Updated 10 months ago
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆35Jul 7, 2026Updated 3 weeks ago
zhaoyx239 / X-Translator
View on GitHub
☆26Jul 21, 2026Updated last week
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kszpxxzmc / ViSAudio
View on GitHub
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
☆117Dec 11, 2025Updated 7 months ago
Eps-Acoustic-Revolution-Lab / EAR_VAE
View on GitHub
[INTERSPEECH 2026] This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming so…
☆91Feb 13, 2026Updated 5 months ago
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
RayYuki / CodecBench
View on GitHub
☆24Nov 16, 2025Updated 8 months ago
qiuk2 / AAR
View on GitHub
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆74Aug 24, 2024Updated last year
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
hkchengrex / MMAudio
View on GitHub
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
☆2,247Feb 23, 2026Updated 5 months ago
xiquan-li / TinyMU
View on GitHub
[ICASSP 2026] TinyMU: A Compact Audio Language Model for Music Understanding
☆37Apr 20, 2026Updated 3 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
wilkinghoff / DSpAST
View on GitHub
Code for the paper "DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models"
☆17Oct 23, 2025Updated 9 months ago
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
Shy-98 / MELLE
View on GitHub
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆41Jun 28, 2025Updated last year
lonzi / mrflow_dpo
View on GitHub
☆22Jan 3, 2026Updated 6 months ago