ANYANTUDRE / Florence-2-Vision-Language-ModelLinks

Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

☆150

Alternatives and similar repositories for Florence-2-Vision-Language-Model

Users that are interested in Florence-2-Vision-Language-Model are comparing it to the libraries listed below

Sorting:

IDEA-Research / RexSeek
[ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
☆177Updated 3 months ago
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆210Updated 3 months ago
2U1 / SmolVLM-Finetune
An open-source implementaion for fine-tuning SmolVLM.
☆62Updated 4 months ago
hustvl / EVF-SAM
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
☆495Updated 10 months ago
UCSC-VLAA / OpenVision
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆454Updated last week
IDEA-Research / Rex-Thinker
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
☆142Updated 7 months ago
retkowsky / florence-2
Florence-2
☆72Updated 11 months ago
JIA-Lab-research / VisionReasoner
Vision Manus: Your versatile Visual AI assistant
☆318Updated this week
NVlabs / PS3
Scaling Vision Pre-Training to 4K Resolution
☆221Updated last month
merveenoyan / siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
☆298Updated 11 months ago
andimarafioti / florence2-finetuning
Quick exploration into fine tuning florence 2
☆339Updated last year
Hon-Wong / VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
☆378Updated 7 months ago
congvvc / HyperSeg
[CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
☆180Updated last year
czg1225 / SlimSAM
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
☆354Updated 4 months ago
xiaomoguhz / DeCLIP
[CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
☆149Updated last month
Vibashan / PosSAM
Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything
☆70Updated last year
zamling / PSALM
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆269Updated last year
robustsam / RobustSAM
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
☆364Updated last year
pasqualedem / LabelAnything
Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
☆86Updated last month
ClaudiaCuttano / SAMWISE
[CVPR 2025 Highlight] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"
☆366Updated 4 months ago
wkcn / TinyCLIP
[ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
☆125Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆336Updated last year
microsoft / LLM2CLIP
LLM2CLIP significantly improves already state-of-the-art CLIP models.
☆623Updated last week
MaverickRen / PixelLM
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆252Updated last year
xk-huang / segment-caption-anything
[CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆231Updated last year
lucasjinreal / Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
☆249Updated 9 months ago
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆249Updated 5 months ago
stevebottos / owl-vit-object-detection
object detection based on owl-vit
☆67Updated 2 years ago
PRITHIVSAKTHIUR / FineTuning-SigLIP-2
Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Imag…
☆48Updated 6 months ago
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆545Updated 3 months ago