Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
☆76Mar 16, 2025Updated 11 months ago
Alternatives and similar repositories for EMOVA
Users that are interested in EMOVA are comparing it to the libraries listed below
Sorting:
- Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…☆19Oct 20, 2025Updated 4 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆126Dec 9, 2024Updated last year
- Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech und…☆42Mar 12, 2023Updated 2 years ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆44Mar 3, 2025Updated last year
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆58Jun 7, 2024Updated last year
- BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing☆59Mar 11, 2024Updated last year
- The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“☆22Jun 17, 2025Updated 8 months ago
- ☆115Sep 18, 2025Updated 5 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 11 months ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆127Nov 8, 2025Updated 4 months ago
- Actually released!☆10Feb 24, 2021Updated 5 years ago
- llama-omni训练代码复现☆74Jan 23, 2025Updated last year
- Streamlit YOLOv5 deployment template☆27Jul 18, 2025Updated 7 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.