meituan-longcat / LongCat-Flash-OmniLinks
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
☆460Updated last week
Alternatives and similar repositories for LongCat-Flash-Omni
Users that are interested in LongCat-Flash-Omni are comparing it to the libraries listed below
Sorting:
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆963Updated 4 months ago
- LongCat Audio Tokenizer and Detokenizer☆282Updated last week
- ☆185Updated 11 months ago
- ☆572Updated 2 weeks ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆575Updated 3 months ago
- Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。☆260Updated last week
- 🤗 R1-AQA Model: mispeech/r1-aqa☆314Updated 10 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆105Updated 6 months ago
- ☆257Updated 8 months ago
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆628Updated 3 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆216Updated 11 months ago
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆425Updated 2 months ago
- ☆77Updated 8 months ago
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆700Updated last month
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆188Updated last month
- ☆342Updated 9 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆362Updated 8 months ago
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆711Updated last week
- ☆160Updated 2 months ago
- ☆114Updated 3 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆76Updated 10 months ago
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.☆173Updated 7 months ago
- llama-omni训练代码复现☆73Updated last year
- Efficient audio understanding with general audio captions☆398Updated 2 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆294Updated 8 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆272Updated last year
- Text-audio foundation model from Boson AI☆117Updated 4 months ago
- ☆77Updated 4 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆299Updated 2 months ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆123Updated 2 months ago