anyantudre / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
☆54Updated 10 months ago
Alternatives and similar repositories for Florence-2-Vision-Language-Model
Users that are interested in Florence-2-Vision-Language-Model are comparing it to the libraries listed below
Sorting:
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 2 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆64Updated 3 months ago
- EMOv2: Pushing 5M Vision Model Frontier☆46Updated 4 months ago
- Scaling Vision Pre-Training to 4K Resolution☆157Updated 2 weeks ago
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆139Updated 5 months ago
- Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆124Updated last month
- Florence-2☆64Updated 3 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆161Updated last month
- New generation of CLIP with fine grained discrimination capability, ICML2025☆89Updated this week
- ☆61Updated last year
- An open-source implementaion for fine-tuning SmolVLM.☆26Updated 2 weeks ago
- Odd-One-Out: Anomaly Detection by Comparing with Neighbors (CVPR25)☆38Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆125Updated 6 months ago
- ☆92Updated 9 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 7 months ago
- ☆181Updated this week
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆156Updated 7 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 9 months ago
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆223Updated 7 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆56Updated last year
- The official repository for the RealSyn dataset☆32Updated 2 weeks ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers"☆65Updated last month
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆90Updated 6 months ago
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor☆107Updated 10 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 8 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆62Updated last year
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆123Updated 9 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆122Updated 7 months ago
- ☆27Updated 10 months ago