Farzad-R / Finetune-LLAVA-NEXT
This repository contains codes for fine-tuning LLAVA-1.6-7b-mistral (Multimodal LLM) model.
☆32Updated 4 months ago
Alternatives and similar repositories for Finetune-LLAVA-NEXT:
Users that are interested in Finetune-LLAVA-NEXT are comparing it to the libraries listed below
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆33Updated last month
- This is implementation of finetuning BLIP model for Visual Question Answering☆64Updated last year
- AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…☆36Updated 2 weeks ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated last month
- Image Classification Testing with LLMs☆63Updated last year
- Quick exploration into fine tuning florence 2☆305Updated 6 months ago
- ☆29Updated last year
- Florence-2☆60Updated last month
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆36Updated 8 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆51Updated last year
- InstructionGPT-4☆39Updated last year
- vision language models finetuning notebooks & use cases (paligemma - florence .....)☆19Updated 6 months ago
- EdgeSAM model for use with Autodistill.☆26Updated 9 months ago
- ☆47Updated last year
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆31Updated 5 months ago
- LLaVA inference with multiple images at once for cross-image analysis.☆48Updated last year
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆53Updated 2 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆145Updated 9 months ago
- 📚 Text Classification with LoRA (Low-Rank Adaptation) of Language Models - Efficiently fine-tune large language models for text classifi…☆47Updated last year
- ☆19Updated last year
- Supporting code for: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions☆24Updated 8 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆256Updated last year
- AICITY2024 Track 2 - Code from AIO_ISC Team☆31Updated 8 months ago
- An open-source implementaion for fine-tuning SmolVLM.☆17Updated 2 months ago
- Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"☆64Updated 3 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆316Updated 8 months ago
- An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.☆144Updated last month
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 9 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆149Updated 6 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated last year