HJ-Ok / AudioBERT
AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)
β36Updated last week
Alternatives and similar repositories for AudioBERT:
Users that are interested in AudioBERT are comparing it to the libraries listed below
- β34Updated 9 months ago
- Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformerβ32Updated 5 months ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'β16Updated 6 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β74Updated last month
- β23Updated 5 months ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11β¦β44Updated 6 months ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)β76Updated last month
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024β15Updated 2 months ago
- GPT for FACodecβ13Updated 10 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ49Updated 3 months ago
- A spoken version of the textual story cloze benchmarkβ14Updated last year
- Codebase and project page for EDMSoundβ33Updated last year
- β18Updated 8 months ago
- β13Updated last year
- β29Updated last year
- Implementation of Google's USM speech model in Pytorchβ27Updated this week
- ESLTTS datasetβ16Updated last week
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into oneβ27Updated 5 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERTβ38Updated 5 months ago
- β46Updated 2 months ago
- Text-To-Speech for NotebookLMβ27Updated last month
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factorsβ16Updated last week
- GPT-style network for phonemization with durations of textβ64Updated 10 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.β21Updated 4 months ago
- β33Updated 2 months ago
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.β47Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β36Updated last year
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterinβ¦β49Updated last year
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioningβ11Updated 7 months ago
- Collection of scripts from mHuBERT-147.β24Updated 2 months ago