AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)
β41Feb 1, 2025Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below
Sorting:
- Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)ββ26Jan 15, 2025Updated last year
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"β31Dec 6, 2023Updated 2 years ago
- Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)β11Oct 15, 2020Updated 5 years ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)β15Oct 27, 2023Updated 2 years ago
- Research Papers on Efficient Neural Fields from EffL Groupβ16Apr 21, 2025Updated 10 months ago
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1β¦β32Jan 19, 2024Updated 2 years ago
- Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]β42Oct 7, 2024Updated last year
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025β30Sep 11, 2025Updated 5 months ago
- β19Feb 2, 2023Updated 3 years ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"β108Dec 20, 2025Updated 2 months ago
- β49Feb 12, 2026Updated 3 weeks ago
- β18May 4, 2025Updated 10 months ago
- A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]β27May 20, 2025Updated 9 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β114Jan 28, 2026Updated last month
- SOTA Piano Transformer model trained on 4.2GB of Solo Piano MIDI musicβ27Nov 9, 2023Updated 2 years ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"β26Mar 27, 2024Updated last year
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matchingβ96Oct 9, 2025Updated 4 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'β101Jul 24, 2024Updated last year
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matchingβ121Mar 27, 2025Updated 11 months ago
- REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASRβ14Dec 11, 2024Updated last year
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech β¦β28Nov 7, 2025Updated 3 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoderβ12Mar 11, 2025Updated 11 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"β11Apr 10, 2025Updated 10 months ago
- Code for the paper "Songs Across Borders: Singable and Controllable Neural Lyric Translation"β25Feb 3, 2026Updated last month
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilitiesβ153Dec 5, 2024Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformerβ329Dec 17, 2025Updated 2 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)β43Jun 13, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Codingβ51May 1, 2025Updated 10 months ago
- The open source code for SimpleSpeech seriesβ145Oct 8, 2024Updated last year
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variabilityβ107Jan 17, 2025Updated last year
- The source code for the paper XiaoiceSing2 (interspeech2023)β49Jan 15, 2024Updated 2 years ago
- This repository provides the materials used in "Unsupervised Melody-to-Lyric Generation" by Yufei Tian, Anjali Narayan-Chen, Shereen Orabβ¦β11Jul 6, 2023Updated 2 years ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.β105May 5, 2025Updated 10 months ago
- β24Sep 10, 2025Updated 5 months ago
- Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetiβ¦β102Oct 15, 2025Updated 4 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β48Jan 19, 2026Updated last month
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformersβ125Mar 20, 2025Updated 11 months ago
- Prosody and Pronunciation Modification Networkβ63May 5, 2025Updated 9 months ago
- Official code for SongEchoβ41Feb 21, 2026Updated last week