XiaomiMiMo / MiMo-Audio-EvalLinks
☆44Updated this week
Alternatives and similar repositories for MiMo-Audio-Eval
Users that are interested in MiMo-Audio-Eval are comparing it to the libraries listed below
Sorting:
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆80Updated 3 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆70Updated this week
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- ☆35Updated last week
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆20Updated last year
- ☆26Updated 3 weeks ago
- ☆38Updated 2 months ago
- GPT-style network for phonemization with durations of text☆67Updated last year
- ☆37Updated last year
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆61Updated 10 months ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆45Updated last year
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆39Updated 2 years ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆108Updated last year
- ☆22Updated 11 months ago
- Official release of StyleTalk dataset.☆69Updated last year
- A spoken version of the textual story cloze benchmark☆18Updated 2 years ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆22Updated 8 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- ESLTTS dataset☆16Updated 7 months ago
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Updated 9 months ago
- This repo is text to speech with learnable audio encoder without alignment with transcript reference☆38Updated last week
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆28Updated last week
- ☆40Updated last year
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆63Updated last year
- ☆32Updated last year
- ☆23Updated 3 months ago
- Official Code for ParrotTTS☆55Updated 11 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆49Updated last year
- ☆14Updated this week
- ☆40Updated 5 months ago