AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)
☆41Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”☆26Jan 15, 2025Updated last year
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)☆11Oct 15, 2020Updated 5 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Group☆16Apr 21, 2025Updated 11 months ago
- Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)☆64Oct 29, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- In progress.☆68Mar 26, 2024Updated 2 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)☆15Oct 27, 2023Updated 2 years ago
- ☆19Feb 2, 2023Updated 3 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆115Jan 28, 2026Updated 2 months ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆110Dec 20, 2025Updated 3 months ago
- ☆18May 4, 2025Updated 11 months ago
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]☆43Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025☆35Sep 11, 2025Updated 7 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Mar 11, 2025Updated last year
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…☆32Jan 19, 2024Updated 2 years ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆98Apr 2, 2026Updated 2 weeks ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆27May 20, 2025Updated 10 months ago
- The open source code for SimpleSpeech series☆144Oct 8, 2024Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆331Dec 17, 2025Updated 3 months ago
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"☆87Sep 18, 2025Updated 6 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs☆16Jul 19, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆57Feb 12, 2026Updated 2 months ago
- The source code for the paper XiaoiceSing2 (interspeech2023)☆49Jan 15, 2024Updated 2 years ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆100Jul 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music☆27Nov 9, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated last year
- Official repository of Wavehax vocoder☆67Dec 20, 2025Updated 3 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆97Nov 9, 2024Updated last year
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆110May 5, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Variable Bitrate Residual Vector Quantization for Audio Coding☆50May 1, 2025Updated 11 months ago
- [InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter☆93Jul 4, 2024Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- ☆25Sep 10, 2025Updated 7 months ago
- Official Implementation of "Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity (ICML 2024)"☆43Aug 28, 2024Updated last year