LoieSun/Auto-ACD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LoieSun/Auto-ACD)

LoieSun / Auto-ACD

code for A Large-scale Dataset for Audio-Language Representation Learning

☆14

Alternatives and similar repositories for Auto-ACD

Users that are interested in Auto-ACD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MAGIC-AI4Med / RP3D-Diag
View on GitHub
Code implementation of RP3D-Diag
☆17Nov 25, 2024Updated last year
MAGIC-AI4Med / SAT
View on GitHub
The official repository for "One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts"
☆10Aug 16, 2024Updated last year
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
haoningwu3639 / SimpleSDM-Video
View on GitHub
A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.
☆20Feb 15, 2024Updated 2 years ago
MAGIC-AI4Med / KEP
View on GitHub
[ECCV 2024 Oral] Knowledge-enhanced pretraining for computational pathology
☆50Apr 17, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
MAGIC-AI4Med / M3Builder
View on GitHub
The official codes for "M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging"
☆45Jul 28, 2025Updated 11 months ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
MAGIC-AI4Med / RadABench
View on GitHub
The official codes for "Can Modern LLMs Act as Agent Cores in Radiology Environments?"
☆29Jan 22, 2025Updated last year
MAGIC-AI4Med / RaTEScore
View on GitHub
[EMNLP 2024] RaTEScore: A Metric for Radiology Report Generation
☆67May 18, 2025Updated last year
zexupan / USEV
View on GitHub
☆14Jul 1, 2024Updated 2 years ago
AV-Reasoner / AV-Reasoner
View on GitHub
☆19Jul 22, 2025Updated last year
ljy19970415 / AutoRG-Brain
View on GitHub
The official codes for "AutoRG-Brain: Grounded Report Generation for Brain MRI".
☆59Jan 6, 2026Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Lzq5 / Video-Text-Alignment
View on GitHub
☆28Jul 18, 2025Updated last year
Go2Heart / EchoSight
View on GitHub
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆90Jan 19, 2026Updated 6 months ago
zexupan / avse_hybrid_loss
View on GitHub
☆16Jun 15, 2022Updated 4 years ago
WangHelin1997 / SoloAudio
View on GitHub
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆119Jan 28, 2026Updated 5 months ago
XinhaoMei / ACT
View on GitHub
Source code for the paper 'Audio Captioning Transformer'
☆56Jan 18, 2022Updated 4 years ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
hmartelb / avlit
View on GitHub
Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Mo…
☆20Sep 1, 2023Updated 2 years ago
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 3 months ago
jinxiang-liu / anno-free-AVS
View on GitHub
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
☆38Oct 11, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
drivendataorg / snomed-ct-entity-linking-runtime
View on GitHub
Runtime repository for the SNOMED CT Entity Linking challenge on DrivenData
☆14Mar 5, 2024Updated 2 years ago
shlizee / savvy
View on GitHub
Repository for SAVVY(Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing) Benchmark and SAVVY model
☆25May 30, 2026Updated last month
PapayaResearch / ctag
View on GitHub
[ICML'24] Creative Text-to-Audio Generation via Synthesizer Programming
☆41Sep 26, 2024Updated last year
BaoBaoGitHub / Hungyi_Lee_Machine_Learning_2021
View on GitHub
李宏毅机器学习2021笔记
☆14Nov 27, 2022Updated 3 years ago
Engineev / solutions
View on GitHub
My personal solutions to some textbook problems
☆12Feb 12, 2020Updated 6 years ago
MediaBrain-SJTU / K-Diag
View on GitHub
☆10Aug 20, 2023Updated 2 years ago
MrGiovanni / OnlineLearning
View on GitHub
[MICCAI 2024] Embracing Massive Medical Data
☆21Jul 5, 2024Updated 2 years ago
xavierfav / feature-comparison-clustering
View on GitHub
Comparing Audio Features for Unsupervised Sound Classification
☆10Jun 22, 2022Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Andong-Li-speech / Neural-Vocoders-as-Speech-Enhancers
View on GitHub
☆52Sep 10, 2024Updated last year
JiabenChen / iQuery
View on GitHub
[CVPR 2023] iQuery: Instruments as Queries for Audio-Visual Sound Separation
☆73Jul 25, 2023Updated 2 years ago
jczhang02 / MUSIC_dataset_script
View on GitHub
This repo contains script to download MUSIC dataset from youtube
☆12Jan 19, 2024Updated 2 years ago
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆21Jun 14, 2024Updated 2 years ago
liuhuadai / AudioLCM
View on GitHub
PyTorch Implementation of [AudioLCM]: a efficient and high-quality text-to-audio generation with latent consistency model.
☆13Jun 15, 2024Updated 2 years ago
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
juselara1 / dmae
View on GitHub
TensorFlow implementation of the Dissimilarity Mixture Autoencoder: https://arxiv.org/abs/2006.08177
☆13Dec 8, 2022Updated 3 years ago