llama-omni训练代码复现
☆74Jan 23, 2025Updated last year
Alternatives and similar repositories for LLaMA-Omni
Users that are interested in LLaMA-Omni are comparing it to the libraries listed below
Sorting:
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 11 months ago
- Video Benchmark Suite: Rapid Evaluation of Video Foundation Models☆15Jan 10, 2025Updated last year
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆15Jul 15, 2025Updated 7 months ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆127Nov 8, 2025Updated 4 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆368May 27, 2025Updated 9 months ago
- ☆26Oct 15, 2025Updated 4 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆28Aug 15, 2025Updated 6 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆995Jan 15, 2026Updated last month
- ☆263May 19, 2025Updated 9 months ago
- V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day☆29Feb 5, 2025Updated last year
- 虚拟主播,是通过计算机画出来的,并不存在于这个世界上。☆14Jan 4, 2022Updated 4 years ago
- This repository collects papers related to Speech Tokenizer.☆17Oct 16, 2024Updated last year
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆20Sep 12, 2025Updated 5 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆254Mar 26, 2025Updated 11 months ago
- ☆80Aug 11, 2025Updated 6 months ago
- A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…☆114Sep 3, 2025Updated 6 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,128May 19, 2025Updated 9 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆40Aug 11, 2025Updated 6 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆76Mar 16, 2025Updated 11 months ago
- ☆64Sep 15, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆675May 24, 2025Updated 9 months ago
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆63Sep 18, 2025Updated 5 months ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆66Updated this week
- This is a general framework for fake audio detection using pytorch lightning☆27Jul 24, 2025Updated 7 months ago
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.☆37Jun 3, 2025Updated 9 months ago
- ☆30Jun 30, 2025Updated 8 months ago
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…☆65Jan 27, 2026Updated last month
- This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".☆66Dec 13, 2024Updated last year
- An ASR (Automatic Speech Recognition) adversarial attack repository.☆39Nov 7, 2023Updated 2 years ago
- Margin-based Vision Transformer☆67Nov 28, 2025Updated 3 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Jan 26, 2024Updated 2 years ago
- SAMO: SPEAKER ATTRACTOR MULTI-CENTER ONE-CLASS LEARNING FOR VOICE ANTI-SPOOFING☆41Apr 5, 2023Updated 2 years ago
- Just prepare config file and start training your metric learning model with ease☆16Apr 2, 2024Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆50Sep 2, 2025Updated 6 months ago
- This repository includes the code to reproduce our paper "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.…☆159Sep 26, 2023Updated 2 years ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆44Oct 28, 2024Updated last year
- ☆42Apr 2, 2025Updated 11 months ago
- Pytorch implementation of "LEVERAGING POSITIONAL-RELATED LOCAL-GLOBAL DEPENDENCY FOR SYNTHETIC SPEECH DETECTION"☆37Jul 24, 2023Updated 2 years ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆183Feb 28, 2026Updated last week