AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)
☆40Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”☆26Jan 15, 2025Updated last year
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)☆11Oct 15, 2020Updated 5 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆30Dec 6, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Group☆16Apr 21, 2025Updated last year
- Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)☆64Oct 29, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- In progress.☆68Mar 26, 2024Updated 2 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)☆16Oct 27, 2023Updated 2 years ago
- ☆19Feb 2, 2023Updated 3 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆118Jan 28, 2026Updated 4 months ago
- [INTERSPEECH 2026]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆112Jun 3, 2026Updated last week
- ☆18May 4, 2025Updated last year
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]☆43Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025☆37Sep 11, 2025Updated 9 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Mar 11, 2025Updated last year
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…☆32Jan 19, 2024Updated 2 years ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆100Apr 2, 2026Updated 2 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]☆28May 20, 2025Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆331Dec 17, 2025Updated 5 months ago
- The open source code for SimpleSpeech series☆146Oct 8, 2024Updated last year
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"☆91Sep 18, 2025Updated 8 months ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs☆16Jul 19, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆64Apr 28, 2026Updated last month
- The source code for the paper XiaoiceSing2 (interspeech2023)☆49Jan 15, 2024Updated 2 years ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆153Dec 5, 2024Updated last year
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI music☆28Nov 9, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated last year
- Official repository of Wavehax vocoder☆73Dec 20, 2025Updated 5 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆100Nov 9, 2024Updated last year
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆114May 5, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter☆97Jul 4, 2024Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 4 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆53May 1, 2025Updated last year
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆108Jan 17, 2025Updated last year
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- ☆27Sep 10, 2025Updated 9 months ago
- Official Implementation of "Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity (ICML 2024)"☆43Aug 28, 2024Updated last year