HJ-Ok / AudioBERTLinks
AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)
β41Updated last year
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below
Sorting:
- β38Updated last year
- Collection of scripts from mHuBERT-147.β32Updated last year
- An official implementation of Style-Talker for Spoken Dialogue Generationβ23Updated last year
- β51Updated 6 months ago
- β19Updated last year
- The official code for the SALMonπ£ benchmark (ICASSP 2025 - Oral)β48Updated 5 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformerβ67Updated last year
- A spoken version of the textual story cloze benchmarkβ20Updated 2 years ago
- β53Updated last year
- GPT for FACodecβ13Updated last year
- Codebase and project page for EDMSoundβ35Updated 2 years ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ59Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β113Updated this week
- Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"β37Updated 3 months ago
- Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformerβ38Updated 11 months ago
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024β16Updated last year
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"β45Updated this week
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β38Updated 2 years ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11β¦β46Updated last year
- My vocoder experimentsβ31Updated 6 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion modelβ57Updated 2 years ago
- Official Implementation of EnCLAP (ICASSP 2024)β94Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.β48Updated 2 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERTβ40Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into oneβ26Updated last year
- Voice conversion with just linear regression.β32Updated 4 months ago
- β61Updated 2 years ago
- small audio language model for reasoningβ86Updated last month
- β14Updated last year
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)β94Updated last year