AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)
β41Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)ββ26Jan 15, 2025Updated last year
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)β11Oct 15, 2020Updated 5 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"β31Dec 6, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Groupβ16Apr 21, 2025Updated last year
- Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)β64Oct 29, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- In progress.β68Mar 26, 2024Updated 2 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)β16Oct 27, 2023Updated 2 years ago
- β19Feb 2, 2023Updated 3 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β117Jan 28, 2026Updated 3 months ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"β112Dec 20, 2025Updated 5 months ago
- β18May 4, 2025Updated last year
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]β43Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025β36Sep 11, 2025Updated 8 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"β26Mar 27, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β12Mar 11, 2025Updated last year
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1β¦β32Jan 19, 2024Updated 2 years ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matchingβ98Apr 2, 2026Updated last month
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]β27May 20, 2025Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformerβ331Dec 17, 2025Updated 5 months ago
- The open source code for SimpleSpeech seriesβ145Oct 8, 2024Updated last year
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"β91Sep 18, 2025Updated 8 months ago
- The source code for the paper CrossSinger (asru2023)β18Oct 12, 2023Updated 2 years ago
- β61Apr 28, 2026Updated 3 weeks ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANsβ16Jul 19, 2023Updated 2 years ago
- The source code for the paper XiaoiceSing2 (interspeech2023)β49Jan 15, 2024Updated 2 years ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'β101Jul 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilitiesβ154Dec 5, 2024Updated last year
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI musicβ27Nov 9, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformersβ125Mar 20, 2025Updated last year
- Official repository of Wavehax vocoderβ72Dec 20, 2025Updated 5 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modellingβ99Nov 9, 2024Updated last year
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.β112May 5, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filterβ97Jul 4, 2024Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β49Jan 19, 2026Updated 4 months ago
- Variable Bitrate Residual Vector Quantization for Audio Codingβ52May 1, 2025Updated last year
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variabilityβ108Jan 17, 2025Updated last year
- Codebase and project page for EDMSoundβ35Nov 20, 2023Updated 2 years ago
- β27Sep 10, 2025Updated 8 months ago
- Official Implementation of "Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity (ICML 2024)"β43Aug 28, 2024Updated last year