emova-ollm / EMOVA
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
☆34Updated 2 months ago
Alternatives and similar repositories for EMOVA
Users that are interested in EMOVA are comparing it to the libraries listed below
Sorting:
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆49Updated this week
- A project for tri-modal LLM benchmarking and instruction tuning.☆34Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆42Updated last month
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆66Updated this week
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".☆98Updated 5 months ago
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 5 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆14Updated last month
- ☆154Updated 3 months ago
- ☆68Updated 2 weeks ago
- ☆72Updated last month
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆97Updated 5 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 9 months ago
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆22Updated last week
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆38Updated last month
- ☆28Updated this week
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆111Updated last month
- Official repo for StableLLAVA☆95Updated last year
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆96Updated this week
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆157Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆107Updated 3 weeks ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆158Updated 2 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆98Updated 8 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- Official repository of MMDU dataset☆90Updated 7 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆24Updated 4 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆60Updated 2 months ago
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆13Updated 7 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆90Updated last week