HJ-Ok / AudioBERTLinks
AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)
β41Updated 11 months ago
Alternatives and similar repositories for AudioBERT
Users that are interested in AudioBERT are comparing it to the libraries listed below
Sorting:
- β38Updated last year
- Collection of scripts from mHuBERT-147.β32Updated last year
- β49Updated 5 months ago
- GPT for FACodecβ13Updated last year
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"β45Updated 3 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generationβ23Updated last year
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ59Updated last year
- The official code for the SALMonπ£ benchmark (ICASSP 2025 - Oral)β48Updated 4 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformerβ67Updated last year
- Codebase and project page for EDMSoundβ35Updated 2 years ago
- A spoken version of the textual story cloze benchmarkβ20Updated 2 years ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)β94Updated last year
- Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"β37Updated 2 months ago
- Text-To-Speech for NotebookLMβ36Updated 5 months ago
- GPT-style network for phonemization with durations of textβ68Updated last year
- ZIQI-Eval: A Music Evaluation Benchmark for Large Language Modelsβ15Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β113Updated last year
- small audio language model for reasoningβ84Updated last month
- ESLTTS datasetβ16Updated 11 months ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMsβ42Updated 3 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into oneβ26Updated last year
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11β¦β46Updated last year
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We prβ¦β24Updated last year
- Voice conversion with just linear regression.β32Updated 3 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.β51Updated 9 months ago
- β16Updated 2 years ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialβ¦β40Updated 11 months ago
- My vocoder experimentsβ31Updated 5 months ago
- β19Updated last year
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.β15Updated 7 months ago