ictnlp / LLaMA-Omni2Links
☆188Updated last month
Alternatives and similar repositories for LLaMA-Omni2
Users that are interested in LLaMA-Omni2 are comparing it to the libraries listed below
Sorting:
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆259Updated last month
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆224Updated 2 weeks ago
- ☆238Updated 2 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆237Updated 3 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆280Updated last week
- Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.☆211Updated this week
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆90Updated last month
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆56Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆74Updated 8 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆104Updated last month
- Collection of Open Source Speech Data☆159Updated 7 months ago
- Official implementation of the TTS model Lina-Speech☆165Updated 5 months ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆212Updated last month
- ☆365Updated 9 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆179Updated 2 months ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆69Updated this week
- AudioBench: A Universal Benchmark for Audio Large Language Models☆227Updated last week
- Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune☆51Updated last month
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆169Updated last month
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆39Updated last week
- An unofficial PyTorch implementation of VALL-E☆87Updated 3 weeks ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆77Updated this week
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆85Updated 3 weeks ago
- VALL-E 2 reproduction☆129Updated 11 months ago
- Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU☆29Updated 2 weeks ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆258Updated 3 weeks ago
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆110Updated last month
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 3 weeks ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆198Updated 3 months ago
- ☆258Updated last year