XiaomiMiMo / MiMo-Audio-EvalLinks
☆66Updated last month
Alternatives and similar repositories for MiMo-Audio-Eval
Users that are interested in MiMo-Audio-Eval are comparing it to the libraries listed below
Sorting:
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆105Updated last month
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆81Updated last month
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆23Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆40Updated last month
- ☆38Updated last year
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆63Updated last year
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆89Updated 3 months ago
- ☆103Updated last week
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆65Updated 11 months ago
- ☆40Updated 3 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆58Updated 3 weeks ago
- ☆28Updated last week
- ☆42Updated 6 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Updated 9 months ago
- ☆19Updated 4 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆76Updated last year
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆49Updated last year
- ☆19Updated last month
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆29Updated 7 months ago
- A spoken version of the textual story cloze benchmark☆19Updated 2 years ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆76Updated last year
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆46Updated last year
- ☆35Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- ☆28Updated 3 months ago
- Official release of StyleTalk dataset.☆70Updated last year
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆27Updated 2 months ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆40Updated last year
- Official Code for ParrotTTS☆57Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Updated last year