2U1 / Llama3.2-Vision-Finetune
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
☆122Updated this week
Alternatives and similar repositories for Llama3.2-Vision-Finetune:
Users that are interested in Llama3.2-Vision-Finetune are comparing it to the libraries listed below
- An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.☆173Updated this week
- E5-V: Universal Embeddings with Multimodal Large Language Models☆224Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆188Updated 3 weeks ago
- ☆73Updated 10 months ago
- ☆292Updated this week
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆43Updated this week
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆109Updated 2 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 5 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆132Updated last week
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆81Updated this week
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆76Updated 7 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆127Updated last month
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆53Updated 2 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆223Updated 5 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆220Updated this week
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆152Updated 6 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆196Updated this week
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆452Updated last week
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆80Updated this week
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆128Updated 2 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…☆227Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆80Updated 2 weeks ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆143Updated this week
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆147Updated last month
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆284Updated last month
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆122Updated 3 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆259Updated this week
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆119Updated this week
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆267Updated 3 weeks ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆107Updated 8 months ago