anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
☆47Updated 9 months ago
Alternatives and similar repositories for Florence-2-Vision-Language-Model:
Users that are interested in Florence-2-Vision-Language-Model are comparing it to the libraries listed below
- ☆64Updated last year
- ☆61Updated last year
- Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆118Updated last week
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆84Updated last year
- Scaling Vision Pre-Training to 4K Resolution☆124Updated last month
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆61Updated last year
- SSA + FastSAM/Semantic Fast Segment Anything , or Fast Semantic Segment Anything☆97Updated last year
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆138Updated last week
- This repository is for the first survey on SAM & SAM2 for Videos.☆43Updated this week
- ☆91Updated 9 months ago
- Codebase for the Recognize Anything Model (RAM)☆78Updated last year
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆86Updated 3 weeks ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago
- [ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces☆240Updated 2 months ago
- One summary of efficient segment anything models☆94Updated 8 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated 2 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 3 months ago
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆129Updated 4 months ago
- EMOv2: Pushing 5M Vision Model Frontier☆45Updated 3 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆60Updated 2 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆89Updated 2 months ago
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor☆104Updated 10 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆46Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆154Updated 7 months ago
- ☆27Updated 3 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆88Updated 6 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆182Updated 3 months ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆49Updated last month
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆56Updated 6 months ago