ictnlp / LLaMA-Omni2
☆156Updated last week
Alternatives and similar repositories for LLaMA-Omni2
Users that are interested in LLaMA-Omni2 are comparing it to the libraries listed below
Sorting:
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆246Updated last month
- Real-time Speech-Text Foundation Model Toolkit (wip)☆228Updated last month
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆196Updated last week
- ☆214Updated last month
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆264Updated 2 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆155Updated last week
- ☆126Updated last month
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆207Updated this week
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆67Updated 4 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆63Updated last week
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆71Updated 7 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆54Updated last month
- AudioBench: A Universal Benchmark for Audio Large Language Models☆205Updated last month
- ☆256Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆148Updated 3 weeks ago
- ☆359Updated 8 months ago
- Collection of Open Source Speech Data☆157Updated 6 months ago
- Official implementation of the TTS model Lina-Speech☆165Updated 4 months ago
- VALL-E 2 reproduction☆128Updated 10 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆97Updated 5 months ago
- ☆97Updated 2 weeks ago
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆105Updated last month
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆178Updated 7 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆187Updated 2 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆316Updated 4 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆407Updated 8 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆292Updated 4 months ago
- A Survey of Spoken Dialogue Models (60 pages)☆298Updated 5 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆233Updated 8 months ago
- The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector (TAFFC 20…☆90Updated last month