wntg / LLaMA-Omni
llama-omni训练代码复现
☆41Updated last week
Alternatives and similar repositories for LLaMA-Omni:
Users that are interested in LLaMA-Omni are comparing it to the libraries listed below
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆82Updated 3 weeks ago
- A Survey of Spoken Dialogue Models (60 pages)☆251Updated 2 months ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆151Updated 2 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆126Updated 3 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆75Updated last month
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆49Updated 7 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆167Updated 6 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆239Updated 2 weeks ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆138Updated last month
- Update ASR paper everyday☆111Updated this week
- The open source code for LLM-Codec☆125Updated 5 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆95Updated last week
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆263Updated 3 weeks ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆140Updated last year
- ☆149Updated 6 months ago
- BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing☆49Updated 10 months ago
- Retrieval-Augmented MOS Prediction with Prior Knowledge Integration☆17Updated last month
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆92Updated last month
- ☆37Updated this week
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆49Updated 3 weeks ago
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆112Updated 2 weeks ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆101Updated this week
- [INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation☆37Updated last year
- EMO-SUPERB submission☆42Updated 4 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆61Updated 2 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆42Updated 7 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆63Updated 3 months ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆44Updated 2 weeks ago
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆29Updated last year
- FastThresholdClustering is an efficient vector clustering algorithm based on FAISS, particularly suitable for large-scale vector data clu…☆22Updated last month