HJ-Ok/AudioBERT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HJ-Ok/AudioBERT)

HJ-Ok / AudioBERT

AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)

☆40

Alternatives and similar repositories for AudioBERT

Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

effl-lab / MaskedKD
View on GitHub
Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”
☆26Jan 15, 2025Updated last year
jaeho-lee / oce
View on GitHub
Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)
☆11Oct 15, 2020Updated 5 years ago
minguinho26 / Prefix_AAC_ICASSP2023
View on GitHub
Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"
☆30Dec 6, 2023Updated 2 years ago
effl-lab / Fast-Neural-Fields
View on GitHub
Research Papers on Efficient Neural Fields from EffL Group
☆16Apr 21, 2025Updated last year
jaeho-lee / MetaSparseINR
View on GitHub
Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)
☆64Oct 29, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jaeho-lee / layer-adaptive-sparsity
View on GitHub
In progress.
☆69Mar 26, 2024Updated 2 years ago
0417keito / UTAUTAI
View on GitHub
UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)
☆17Oct 27, 2023Updated 2 years ago
CODEJIN / XiaoiceSing2
View on GitHub
☆19Feb 2, 2023Updated 3 years ago
SWivid / AUV
View on GitHub
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
☆28Oct 11, 2025Updated 9 months ago
zhai-lw / SQCodec
View on GitHub
A lightweight audio codec based on a single quantizer
☆72Aug 15, 2025Updated 11 months ago
LiChaiUSTC / CSL-L2M
View on GitHub
☆18May 4, 2025Updated last year
Audio-AGI / dcase2024_task9_baseline
View on GitHub
Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"
☆26Mar 27, 2024Updated 2 years ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
0417keito / JEN-1-COMPOSER-pytorch
View on GitHub
Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…
☆32Jan 19, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
yangdongchao / SimpleSpeech
View on GitHub
The open source code for SimpleSpeech series
☆147Oct 8, 2024Updated last year
haidog-yaqub / EzAudio
View on GitHub
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
☆333Dec 17, 2025Updated 7 months ago
yl4579 / SLMGAN
View on GitHub
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
☆16Jul 19, 2023Updated 3 years ago
asigalov61 / Euterpe-X
View on GitHub
[DEPRECIATED] [PyTorch 2.0] [638M] [85.33% acc] Full-attention multi-instrumental music transformer for supervised music generation, opti…
☆33Nov 23, 2023Updated 2 years ago
malradhi / PACodec
View on GitHub
[ICASSP 2026]Official code for "Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum"
☆27Jan 22, 2026Updated 6 months ago
Sreyan88 / GAMA
View on GitHub
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆153Dec 5, 2024Updated last year
AgentCooper2002 / EDMSound
View on GitHub
Codebase and project page for EDMSound
☆35Nov 20, 2023Updated 2 years ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zengchang233 / xiaoicesing2
View on GitHub
The source code for the paper XiaoiceSing2 (interspeech2023)
☆49Jan 15, 2024Updated 2 years ago
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated 2 years ago
seungheondoh / music-text-representation-pp
View on GitHub
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]
☆43Oct 7, 2024Updated last year
thuhcsi / VoxInstruct
View on GitHub
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
☆100Nov 9, 2024Updated last year
asigalov61 / GIGA-Piano-XL
View on GitHub
SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music
☆28Nov 9, 2023Updated 2 years ago
BakerBunker / FreeV
View on GitHub
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
☆98Jul 4, 2024Updated 2 years ago
WangHelin1997 / SoloAudio
View on GitHub
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
☆121Jan 28, 2026Updated 5 months ago
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
seungheondoh / musical-word-embedding
View on GitHub
Musical Word Embedding for Music Tagging and Retrieval [IEEE TASLP]
☆29Apr 23, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mulab-mir / muchomusic
View on GitHub
MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.
☆46Dec 3, 2024Updated last year
effl-lab / TACO
View on GitHub
Official Implementation of "Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity (ICML 2024)"
☆44Aug 28, 2024Updated last year
winddori2002 / DEX-TTS
View on GitHub
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
☆108Jan 17, 2025Updated last year
xiquan-li / FineLAP
View on GitHub
[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training
☆36Apr 20, 2026Updated 3 months ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
chomeyama / wavehax
View on GitHub
Official repository of Wavehax vocoder
☆75Dec 20, 2025Updated 7 months ago
luotianze666 / WaveFM
View on GitHub
[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
☆133Apr 8, 2026Updated 3 months ago