aimagelab / LLaVA-MORE
LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1
☆86Updated last month
Related projects ⓘ
Alternatives and complementary repositories for LLaVA-MORE
- 【NeurIPS 2024】Dense Connector for MLLMs☆140Updated last month
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆212Updated 3 months ago
- ☆55Updated 4 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆74Updated last week
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆134Updated 5 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated 2 weeks ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆82Updated 4 months ago
- ☆120Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆121Updated 5 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆102Updated 5 months ago
- Official repo for StableLLAVA☆91Updated 11 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆175Updated 4 months ago
- ☆146Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆125Updated 2 months ago
- ☆105Updated 3 months ago
- ☆131Updated 11 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆64Updated last week
- Official implementation of the Law of Vision Representation in MLLMs☆134Updated this week
- Matryoshka Multimodal Models☆84Updated this week
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆245Updated 10 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆297Updated 4 months ago