2U1 / Pixtral-FinetuneLinks

An open-source implementaion for fine-tuning Pixtral by MistralAI.

☆16

Alternatives and similar repositories for Pixtral-Finetune

Users that are interested in Pixtral-Finetune are comparing it to the libraries listed below

Sorting:

2U1 / Llama3.2-Vision-Finetune
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
☆164Updated 2 weeks ago
2U1 / Phi3-Vision-Finetune
An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
☆95Updated last month
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆297Updated 3 months ago
gokayfem / awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
☆850Updated 3 months ago
zhangfaen / finetune-Qwen2-VL
☆362Updated 3 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆379Updated last month
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆321Updated 10 months ago
2U1 / Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
☆784Updated this week
mapluisch / LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
☆51Updated last year
dino-chiio / blip-vqa-finetune
This is implementation of finetuning BLIP model for Visual Question Answering
☆68Updated last year
SALT-NLP / LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆267Updated 11 months ago
zjysteven / VLM-Visualizer
Visualizing the attention of vision-language models
☆181Updated 3 months ago
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆55Updated last month
aimagelab / LLaVA-MORE
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
☆138Updated last month
allenai / molmo
Code for the Molmo Vision-Language Model
☆431Updated 5 months ago
friedrichor / Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
☆197Updated last week
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]
☆215Updated 2 months ago
OpenGVLab / all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆486Updated 9 months ago
JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …
☆466Updated 2 months ago
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆151Updated 8 months ago
DirtyHarryLYL / LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆865Updated 2 months ago
harrytea / Awesome-Document-Understanding
Document Artifical Intelligence
☆170Updated last month
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆244Updated 4 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 2 months ago
kyegomez / PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆90Updated last year
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆886Updated 6 months ago
mertyg / vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆279Updated 2 years ago
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆78Updated 4 months ago
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆279Updated 8 months ago
TIGER-AI-Lab / VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]
☆237Updated this week