Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
☆76Mar 16, 2025Updated 11 months ago
Alternatives and similar repositories for EMOVA
Users that are interested in EMOVA are comparing it to the libraries listed below
Sorting:
- A project for tri-modal LLM benchmarking and instruction tuning.☆57Mar 27, 2025Updated 11 months ago
- Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…☆19Oct 20, 2025Updated 4 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆126Dec 9, 2024Updated last year
- Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech und…☆42Mar 12, 2023Updated 2 years ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆44Mar 3, 2025Updated last year
- Survey on speech generation work.☆21Nov 26, 2023Updated 2 years ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆59Jun 7, 2024Updated last year
- BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing☆59Mar 11, 2024Updated last year
- The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“☆22Jun 17, 2025Updated 8 months ago
- ☆115Sep 18, 2025Updated 5 months ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆127Nov 8, 2025Updated 4 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆37Apr 7, 2025Updated 11 months ago
- Actually released!☆10Feb 24, 2021Updated 5 years ago
- llama-omni训练代码复现☆74Jan 23, 2025Updated last year
- Streamlit YOLOv5 deployment template☆27Jul 18, 2025Updated 7 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆34Mar 11, 2025Updated 11 months ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Mar 4, 2025Updated last year
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆74Feb 27, 2026Updated last week
- cbReader - A simple web-based comic book reader (CBZ/CBR)☆10May 21, 2018Updated 7 years ago
- ☆24Feb 4, 2026Updated last month
- ☆18Jun 10, 2025Updated 8 months ago
- official code for CVPR'24 paper Diff-BGM☆71Oct 12, 2024Updated last year
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆134Sep 19, 2025Updated 5 months ago
- [TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation☆31Sep 6, 2024Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆88Jun 18, 2024Updated last year
- Suite for phonetic word embeddings, especially their evaluation and baseline models.☆37Mar 3, 2025Updated last year
- Official implementation of "MoST: Motion Style Transformer between Diverse Action Contents"☆37Jun 26, 2024Updated last year
- ☆12Sep 25, 2023Updated 2 years ago
- A working FE Bypass for all Roblox clients☆19Jan 10, 2026Updated last month
- Just prepare config file and start training your metric learning model with ease☆16Apr 2, 2024Updated last year
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- ☆14Dec 5, 2025Updated 3 months ago
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 3 years ago
- Cannabis strain information☆10Feb 20, 2016Updated 10 years ago
- Desktop client for Walltaker powered by golang☆12Sep 13, 2022Updated 3 years ago
- A pytorch implementation of D3Net.☆11Aug 8, 2021Updated 4 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆81Jun 7, 2024Updated last year
- Official source codes of airsep☆39Mar 26, 2024Updated last year