shuaijiang / Ke-Omni-RLinks
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
☆42Updated last month
Alternatives and similar repositories for Ke-Omni-R
Users that are interested in Ke-Omni-R are comparing it to the libraries listed below
Sorting:
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆48Updated last year
- Official release of StyleTalk dataset.☆67Updated last year
- The open source code for LLM-Codec☆137Updated 11 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆97Updated 2 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆60Updated 9 months ago
- ☆96Updated last month
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆50Updated 2 months ago
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆50Updated last year
- ☆32Updated last year
- Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆128Updated last week
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆61Updated 9 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆67Updated 3 weeks ago
- ☆13Updated last year
- ☆41Updated 6 months ago
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆58Updated 9 months ago
- ☆40Updated 11 months ago
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆74Updated 2 years ago
- Towards Comprehensive Benchmark for End-to-End Spoken Dialogue Models☆35Updated last week
- Code and pretrained models for "DUB: Discrete Unit Back-translation for Speech Translation" (ACL 2023 Findings)☆28Updated 2 years ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆54Updated 3 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆85Updated 9 months ago
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆19Updated last year
- ☆65Updated last month
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆73Updated 9 months ago
- ☆18Updated last month
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆83Updated 3 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆107Updated 2 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.☆142Updated this week
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆79Updated 7 months ago
- ☆135Updated 3 months ago