Hannieliao/Baton

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Hannieliao/Baton)

Hannieliao / Baton

Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"

☆32

Alternatives and similar repositories for Baton

Users that are interested in Baton are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

VincentHancoder / ViGoR-Bench-Eval
View on GitHub
☆34Apr 5, 2026Updated 3 months ago
VincentHancoder / REPARO
View on GitHub
The official implementation of work "REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment".
☆125Sep 14, 2024Updated last year
VincentHancoder / AToM
View on GitHub
The official implementation of work "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward".
☆19Mar 25, 2025Updated last year
LoieSun / Auto-ACD
View on GitHub
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Sep 18, 2024Updated last year
juhayna-zh / BSRNN-speech-preprocess
View on GitHub
A solution to denoising and separating for two-speaker-mixed noisy speech, using a BSRNN inspired network.
☆15Aug 22, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gwh22 / LAFMA
View on GitHub
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆44Jun 13, 2024Updated 2 years ago
lyk412 / Consistent123
View on GitHub
[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors
☆25Oct 22, 2024Updated last year
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
TiffanyBlews / MozartsTouch
View on GitHub
Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
☆43Mar 17, 2026Updated 4 months ago
PapayaResearch / ctag
View on GitHub
[ICML'24] Creative Text-to-Audio Generation via Synthesizer Programming
☆41Sep 26, 2024Updated last year
AmphionTeam / Emilia-NV
View on GitHub
Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"
☆91Sep 18, 2025Updated 10 months ago
WangHelin1997 / SoloAudio
View on GitHub
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆119Jan 28, 2026Updated 5 months ago
kaihuhuang / Language-Group
View on GitHub
☆11Dec 24, 2024Updated last year
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
JusperLee / Look2hear
View on GitHub
A toolkit for researchers in the multimodal sound separation.
☆16Oct 20, 2023Updated 2 years ago
haoheliu / SemantiCodec
View on GitHub
☆45Jun 11, 2024Updated 2 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
liuhuadai / AudioLCM
View on GitHub
PyTorch Implementation of [AudioLCM]: a efficient and high-quality text-to-audio generation with latent consistency model.
☆13Jun 15, 2024Updated 2 years ago
Tencent / HaploVLM
View on GitHub
ICML2025
☆63Aug 28, 2025Updated 10 months ago
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
Aisaka0v0 / CLAPSep
View on GitHub
Query-conditioned target sound extraction model
☆30Mar 25, 2025Updated last year
Bai-YT / ConsistencyTTA
View on GitHub
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
☆39Nov 20, 2024Updated last year
LiChaiUSTC / CSL-L2M
View on GitHub
☆18May 4, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
YuejieGao / TG-CRITIC
View on GitHub
TG-CRITIC: A TIMBRE-GUIDED MODEL FOR REFERENCE-INDEPENDENT SINGING EVALUATION
☆18May 26, 2023Updated 3 years ago
johnmartinsson / differentiable-mel-spectrogram
View on GitHub
The official implementation of DMEL the method presented in the paper "DMEL: The differentiable log-Mel spectrogram as a trainable layer …
☆24Dec 21, 2024Updated last year
zeyuxie29 / PicoAudio
View on GitHub
☆45Jan 13, 2025Updated last year
qiuqiangkong / music_llm
View on GitHub
☆56Jul 13, 2025Updated last year
dmksjfl / PAR
View on GitHub
Official code for Cross-Domain Policy Adaptation by Capturing Representation Mismatch (ICML 2024)
☆15Aug 15, 2025Updated 11 months ago
HalleyYoung / generative-grammar-music
View on GitHub
☆10Sep 29, 2015Updated 10 years ago
genisplaja / diffusion-vocal-sep
View on GitHub
Code for "A diffusion-inspired training strategy for singing voice extraction in the waveform domain" (ISMIR 2022)
☆17Feb 16, 2023Updated 3 years ago
LiChenda / Multi-clue-TSE-data
View on GitHub
Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"
☆17May 19, 2023Updated 3 years ago
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
he-nantian / ReDiffuser
View on GitHub
ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation
☆15Jun 2, 2024Updated 2 years ago
XinhaoMei / ACT
View on GitHub
Source code for the paper 'Audio Captioning Transformer'
☆56Jan 18, 2022Updated 4 years ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 3 months ago
heng-hw / V2A-Mapper
View on GitHub
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆29Dec 14, 2023Updated 2 years ago
7Xin / DPI-TTS
View on GitHub
☆13Sep 12, 2024Updated last year
FrontierLabs / F5R-TTS
View on GitHub
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆169Mar 3, 2026Updated 4 months ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year