VITA-MLLM / Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆270Updated last month
Alternatives and similar repositories for Freeze-Omni:
Users that are interested in Freeze-Omni are comparing it to the libraries listed below
- A Survey of Spoken Dialogue Models (60 pages)☆261Updated 2 months ago
- ☆187Updated 4 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆260Updated 2 weeks ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆152Updated 3 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆190Updated last month
- llama-omni训练代码复现☆42Updated 3 weeks ago
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆525Updated 8 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆77Updated 2 months ago
- Speech, Language, Audio, Music Processing with Large Language Model☆713Updated this week
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆246Updated last month
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆387Updated last year
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆387Updated 5 months ago
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆364Updated this week
- ☆203Updated 2 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆128Updated 4 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆87Updated last month
- Audio Large Language Models☆366Updated last month
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆144Updated 8 months ago
- Unoffical implementation of Megatts2☆275Updated 10 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆139Updated last year
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆108Updated last week
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆479Updated 2 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆142Updated 2 weeks ago
- Awesome speech/audio LLMs, representation learning, and codec models☆883Updated this week
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆62Updated last month
- 第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。☆542Updated last year
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"☆825Updated 5 months ago
- flow mirror models from JZX AI Labs☆42Updated 4 months ago