wntg / LLaMA-OmniView external linksLinks
llama-omni训练代码复现
☆74Jan 23, 2025Updated last year
Alternatives and similar repositories for LLaMA-Omni
Users that are interested in LLaMA-Omni are comparing it to the libraries listed below
Sorting:
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 10 months ago
- Video Benchmark Suite: Rapid Evaluation of Video Foundation Models☆15Jan 10, 2025Updated last year
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆125Nov 8, 2025Updated 3 months ago
- ☆26Oct 15, 2025Updated 4 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆28Aug 15, 2025Updated 6 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆972Jan 15, 2026Updated last month
- ☆262May 19, 2025Updated 8 months ago
- V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day☆29Feb 5, 2025Updated last year
- 虚拟主播,是通过计算机画出来的,并不存在于这个世界上。☆13Jan 4, 2022Updated 4 years ago
- This repository collects papers related to Speech Tokenizer.☆17Oct 16, 2024Updated last year
- Multimodal Open Source Framework for Conversational Agent Research and Development.☆22Feb 16, 2025Updated 11 months ago
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆20Sep 12, 2025Updated 5 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆252Mar 26, 2025Updated 10 months ago
- A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…☆111Sep 3, 2025Updated 5 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,123May 19, 2025Updated 8 months ago
- [CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering☆24Apr 24, 2025Updated 9 months ago
- Extract phoneme-level timestamps from speeh audio.☆116Updated this week
- ☆115Sep 18, 2025Updated 4 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆76Mar 16, 2025Updated 11 months ago
- CFAD: A Chinese Dataset for Fake Audio Detection☆23Jul 3, 2023Updated 2 years ago
- ☆64Sep 15, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆673May 24, 2025Updated 8 months ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆59Feb 2, 2026Updated last week
- ☆29Jun 30, 2025Updated 7 months ago
- This is a general framework for fake audio detection using pytorch lightning☆27Jul 24, 2025Updated 6 months ago
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…☆64Jan 27, 2026Updated 2 weeks ago
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.☆37Jun 3, 2025Updated 8 months ago
- An ASR (Automatic Speech Recognition) adversarial attack repository.☆39Nov 7, 2023Updated 2 years ago
- This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".☆66Dec 13, 2024Updated last year
- Margin-based Vision Transformer☆66Nov 28, 2025Updated 2 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Jan 26, 2024Updated 2 years ago
- SAMO: SPEAKER ATTRACTOR MULTI-CENTER ONE-CLASS LEARNING FOR VOICE ANTI-SPOOFING☆42Apr 5, 2023Updated 2 years ago
- Just prepare config file and start training your metric learning model with ease☆16Apr 2, 2024Updated last year
- Virtual news production using Tacotron2 and Wav2Lip☆11Nov 14, 2023Updated 2 years ago
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆50Sep 2, 2025Updated 5 months ago
- The dataset of Speech Recognition☆449Jan 4, 2026Updated last month
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,524Nov 5, 2024Updated last year
- This repository includes the code to reproduce our paper "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.…☆158Sep 26, 2023Updated 2 years ago
- [ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆103Dec 8, 2025Updated 2 months ago