meituan-longcat / LongCat-Flash-OmniLinks
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
β413Updated last week
Alternatives and similar repositories for LongCat-Flash-Omni
Users that are interested in LongCat-Flash-Omni are comparing it to the libraries listed below
Sorting:
- π€ R1-AQA Model: mispeech/r1-aqaβ306Updated 7 months ago
- β181Updated 9 months ago
- LongCat Audio Tokenizer and Detokenizerβ252Updated last week
- MiMo-Audio: Audio Language Models are Few-Shot Learnersβ859Updated 2 months ago
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLMβ166Updated last week
- β249Updated 6 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interactionβ216Updated 8 months ago
- β78Updated 6 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.β538Updated 3 weeks ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.β177Updated last month
- β¨β¨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMβ358Updated 6 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Modelβ94Updated 4 months ago
- β329Updated 7 months ago
- β104Updated last month
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.β151Updated 5 months ago
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representationβ382Updated 2 weeks ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Herβ55Updated 7 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systemsβ84Updated last year
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM πβ269Updated 10 months ago
- Efficient audio understanding with general audio captionsβ381Updated 3 weeks ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β74Updated 8 months ago
- β149Updated last week
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionβ124Updated 11 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.β35Updated 7 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLMβ290Updated 6 months ago
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contaiβ¦β41Updated 7 months ago
- β118Updated 2 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.β37Updated last year
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.β580Updated 3 weeks ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.β222Updated 3 months ago