VITA-MLLM / Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆63Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Freeze-Omni
- SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer☆96Updated this week
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆49Updated last month
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆81Updated 2 weeks ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆136Updated last year
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆29Updated 3 months ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆30Updated last month
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆61Updated this week
- Real-time Speech-Text Foundation Model Toolkit (wip)☆119Updated 3 weeks ago
- The open source code for LLM-Codec☆114Updated 2 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆51Updated 2 months ago
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆42Updated 4 months ago
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).☆122Updated 4 months ago
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆60Updated 3 months ago
- trying to reproduce suno v3☆25Updated 7 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆34Updated last week
- ☆12Updated 3 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆131Updated last month
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆83Updated last year
- flow mirror models from JZX AI Labs☆40Updated last month
- The repoduction codes for Qwen-Audio Fine-tuning☆23Updated 2 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆44Updated 2 months ago
- ☆10Updated 8 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆36Updated 5 months ago
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆102Updated last month
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆113Updated 6 months ago
- BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing☆45Updated 8 months ago
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆63Updated 8 months ago
- Code and pretrained models for "DUB: Discrete Unit Back-translation for Speech Translation" (ACL 2023 Findings)☆26Updated last year
- Official release of StyleTalk dataset.☆57Updated 4 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆157Updated 4 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆38Updated last week