retkowsky / florence-2Links

Florence-2

☆71

Alternatives and similar repositories for florence-2

Users that are interested in florence-2 are comparing it to the libraries listed below

Sorting:

IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆208Updated last month
anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…
☆114Updated last year
IDEA-Research / RexSeek
[ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
☆172Updated last month
IDEA-Research / Rex-Thinker
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
☆127Updated 4 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 3 weeks ago
lucasjinreal / Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
☆239Updated 6 months ago
andimarafioti / florence2-finetuning
Quick exploration into fine tuning florence 2
☆334Updated last year
OPPOMKLab / recognize-anything
Codebase for the Recognize Anything Model (RAM)
☆87Updated last year
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated last year
wkcn / TinyCLIP
[ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
☆115Updated last year
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆279Updated last year
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆96Updated 5 months ago
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆255Updated last year
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆242Updated 3 months ago
thunlp / Migician
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆80Updated 5 months ago
autodistill / autodistill-florence-2
Use Florence 2 to auto-label data for use in training fine-tuned object detection models.
☆67Updated last year
PhoenixZ810 / MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆158Updated last year
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆58Updated 6 months ago
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆412Updated 6 months ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆162Updated 10 months ago
WePOINTS / WePOINTS
☆186Updated 9 months ago
OpenGVLab / InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
☆105Updated last year
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆126Updated last year
dvlab-research / VisionReasoner
Vision Manus: Your versatile Visual AI assistant
☆293Updated last month
2U1 / SmolVLM-Finetune
An open-source implementaion for fine-tuning SmolVLM.
☆55Updated 2 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆563Updated 4 months ago
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆210Updated last year
Leon1207 / Video-RAG-master
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆336Updated 2 weeks ago
gls0425 / LinVT
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆82Updated 10 months ago
StevenGrove / ComfyUI-YOLOWorld
ComfyUI YOLO-World Integration
☆48Updated last year