meituan-longcat / LongCat-Flash-OmniLinks
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
☆156Updated this week
Alternatives and similar repositories for LongCat-Flash-Omni
Users that are interested in LongCat-Flash-Omni are comparing it to the libraries listed below
Sorting:
- LongCat Audio Tokenizer and Detokenizer☆196Updated this week
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆89Updated 3 months ago
- ☆67Updated last month
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆81Updated last month
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆160Updated this week
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆84Updated last year
- ☆242Updated 5 months ago
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆297Updated last week
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆124Updated 10 months ago
- ☆103Updated 2 weeks ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆172Updated 2 weeks ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆215Updated 8 months ago
- ☆78Updated 6 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆107Updated last month
- ☆58Updated 4 months ago
- ☆117Updated 2 months ago
- Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU☆54Updated 4 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆178Updated last year
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆82Updated 4 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆306Updated 7 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆33Updated 6 months ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆40Updated last month
- ☆49Updated 2 months ago
- ☆180Updated 8 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆37Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆56Updated 6 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆76Updated last year
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆220Updated 5 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆88Updated 3 weeks ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆322Updated 3 months ago