Farzad-R / Finetune-LLAVA-NEXT

This repository contains codes for fine-tuning LLAVA-1.6-7b-mistral (Multimodal LLM) model.

☆33

Alternatives and similar repositories for Finetune-LLAVA-NEXT:

Users that are interested in Finetune-LLAVA-NEXT are comparing it to the libraries listed below

anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…
☆47Updated 9 months ago
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆60Updated 2 months ago
2U1 / SmolVLM-Finetune
An open-source implementaion for fine-tuning SmolVLM.
☆25Updated 3 weeks ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 2 months ago
mbzuai-oryx / AIN
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…
☆39Updated last month
dino-chiio / blip-vqa-finetune
This is implementation of finetuning BLIP model for Visual Question Answering
☆65Updated last year
nadsoft-opensource / RAG-with-open-source-multi-modal
☆20Updated last year
bdytx5 / finetune_LLaVA
☆29Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆319Updated 9 months ago
mbzuai-oryx / ClimateGPT
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabi…
☆78Updated 7 months ago
CharlesCNorton / yoflo-gui
Real-time, YOLO-like object detection using the Florence-2-base-ft model with a user-friendly GUI.
☆23Updated last month
GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆58Updated 10 months ago
neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆105Updated 4 months ago
wangjunchi / LLMSeg
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
☆150Updated last year
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆54Updated this week
zhangfaen / finetune-Florence-2-large-ft
☆10Updated 6 months ago
Hon-Wong / VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
☆138Updated last week
purnasai / CLIP_Image_Retrieval
Image/Instance Retrieval using CLIP, A self supervised Learning Model
☆28Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆91Updated 4 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆52Updated 5 months ago
aimagelab / ReT
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
☆14Updated 3 weeks ago
samar-khanna / ExPLoRA
Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"
☆31Updated 6 months ago
Delve-ERAV1 / Phi-2-Vision-Language
Pretraining and finetuning for visual instruction following with Mixture of Experts
☆13Updated last year
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆47Updated 8 months ago
riedlerm / multimodal_rag_for_industry
Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications
☆45Updated 5 months ago
chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆125Updated 10 months ago
b-hahn / CLIP
FInetuning CLIP for Few Shot Learning
☆41Updated 3 years ago
arielnlee / LLaVA-1.6-ft
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆36Updated last year
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆154Updated 7 months ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆147Updated 10 months ago