andimarafioti / florence2-finetuningLinks

Quick exploration into fine tuning florence 2

☆334

Alternatives and similar repositories for florence2-finetuning

Users that are interested in florence2-finetuning are comparing it to the libraries listed below

Sorting:

microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆563Updated 4 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆390Updated last week
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆248Updated last year
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
☆378Updated last year
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆191Updated last year
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 3 weeks ago
zhangfaen / finetune-Qwen2-VL
☆379Updated 9 months ago
merveenoyan / siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
☆284Updated 8 months ago
GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆58Updated last year
retkowsky / florence-2
Florence-2
☆71Updated 9 months ago
HyperGAI / HPT
HPT - Open Multimodal LLMs from HyperGAI
☆315Updated last year
lucasjinreal / Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
☆239Updated 6 months ago
LLaVA-VL / LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆760Updated last year
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,046Updated last year
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆279Updated last year
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆208Updated last month
anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…
☆116Updated last year
google / imageinwords
Data release for the ImageInWords (IIW) paper.
☆222Updated last year
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆58Updated 6 months ago
penghao-wu / vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆679Updated last year
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆412Updated 6 months ago
hustvl / EVF-SAM
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
☆486Updated 8 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆412Updated 9 months ago
SALT-NLP / LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Updated last year
IDEA-Research / Grounding-DINO-1.5-API
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆1,056Updated 9 months ago
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307Updated 5 months ago
RLHF-V / RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆423Updated 6 months ago
luogen1996 / LLaVA-HR
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆246Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆246Updated 9 months ago