AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)
☆41Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”☆26Jan 15, 2025Updated last year
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)☆11Oct 15, 2020Updated 5 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Group☆16Apr 21, 2025Updated last year
- Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)☆64Oct 29, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- In progress.☆68Mar 26, 2024Updated 2 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)☆15Oct 27, 2023Updated 2 years ago
- ☆19Feb 2, 2023Updated 3 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆118Jan 28, 2026Updated 3 months ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆112Dec 20, 2025Updated 4 months ago
- ☆18May 4, 2025Updated last year
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]☆43Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025☆35Sep 11, 2025Updated 7 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆12Mar 11, 2025Updated last year
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…☆32Jan 19, 2024Updated 2 years ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆98Apr 2, 2026Updated last month
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 11 months ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆331Dec 17, 2025Updated 4 months ago
- The open source code for SimpleSpeech series☆144Oct 8, 2024Updated last year
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"☆88Sep 18, 2025Updated 7 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs☆16Jul 19, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆60Apr 28, 2026Updated last week
- The source code for the paper XiaoiceSing2 (interspeech2023)☆49Jan 15, 2024Updated 2 years ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music☆27Nov 9, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated last year
- Explore training for quantized models☆26Jul 12, 2025Updated 9 months ago
- Official repository of Wavehax vocoder☆68Dec 20, 2025Updated 4 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆99Nov 9, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆112May 5, 2025Updated last year
- [InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter☆97Jul 4, 2024Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 3 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated last year
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆108Jan 17, 2025Updated last year
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- ☆26Sep 10, 2025Updated 7 months ago