2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆29Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Molmo-Finetune
- ☆74Updated 8 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆74Updated last week
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆69Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated 2 weeks ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- ☆58Updated 9 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆184Updated this week
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆115Updated last week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated 3 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- ☆105Updated 3 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated 7 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆102Updated last month
- Making LLaVA Tiny via MoE-Knowledge Distillation☆60Updated 3 weeks ago
- a family of highly capabale yet efficient large multimodal models☆166Updated 2 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…☆181Updated last month
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆36Updated 5 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆59Updated 3 weeks ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- ☆45Updated last year
- Official repository of MMDU dataset☆75Updated last month
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆86Updated 8 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆35Updated last month