AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)
☆41Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”☆26Jan 15, 2025Updated last year
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)☆11Oct 15, 2020Updated 5 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Group☆16Apr 21, 2025Updated 11 months ago
- Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)☆64Oct 29, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- In progress.☆68Mar 26, 2024Updated 2 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)☆15Oct 27, 2023Updated 2 years ago
- ☆19Feb 2, 2023Updated 3 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆109Dec 20, 2025Updated 3 months ago
- ☆18May 4, 2025Updated 10 months ago
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]☆43Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025☆35Sep 11, 2025Updated 6 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆12Mar 11, 2025Updated last year
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…☆32Jan 19, 2024Updated 2 years ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆96Oct 9, 2025Updated 5 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 10 months ago
- The open source code for SimpleSpeech series☆145Oct 8, 2024Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆330Dec 17, 2025Updated 3 months ago
- ☆50Feb 12, 2026Updated last month
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"☆86Sep 18, 2025Updated 6 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs☆16Jul 19, 2023Updated 2 years ago
- The source code for the paper XiaoiceSing2 (interspeech2023)☆49Jan 15, 2024Updated 2 years ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music☆27Nov 9, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated last year
- Official repository of Wavehax vocoder☆66Dec 20, 2025Updated 3 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆97Nov 9, 2024Updated last year
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆108May 5, 2025Updated 10 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- [InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter☆93Jul 4, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆50May 1, 2025Updated 10 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- ☆24Sep 10, 2025Updated 6 months ago
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆123Mar 27, 2025Updated 11 months ago