Farzad-R / Finetune-LLAVA-NEXT
This repository contains codes for fine-tuning LLAVA-1.6-7b-mistral (Multimodal LLM) model.
☆33Updated 5 months ago
Alternatives and similar repositories for Finetune-LLAVA-NEXT:
Users that are interested in Finetune-LLAVA-NEXT are comparing it to the libraries listed below
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆47Updated 9 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆60Updated 2 months ago
- An open-source implementaion for fine-tuning SmolVLM.☆25Updated 3 weeks ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 2 months ago
- AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…☆39Updated last month
- This is implementation of finetuning BLIP model for Visual Question Answering☆65Updated last year
- ☆20Updated last year
- ☆29Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆319Updated 9 months ago
- [EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabi…☆78Updated 7 months ago
- Real-time, YOLO-like object detection using the Florence-2-base-ft model with a user-friendly GUI.☆23Updated last month
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 10 months ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆105Updated 4 months ago
- LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning☆150Updated last year
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆54Updated this week
- ☆10Updated 6 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆138Updated last week
- Image/Instance Retrieval using CLIP, A self supervised Learning Model☆28Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆91Updated 4 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 5 months ago
- [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval☆14Updated 3 weeks ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆31Updated 6 months ago
- Pretraining and finetuning for visual instruction following with Mixture of Experts☆13Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 8 months ago
- Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications☆45Updated 5 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆125Updated 10 months ago
- FInetuning CLIP for Few Shot Learning