retkowsky / florence-2Links
Florence-2
☆67Updated 3 months ago
Alternatives and similar repositories for florence-2
Users that are interested in florence-2 are comparing it to the libraries listed below
Sorting:
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆184Updated 4 months ago
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆61Updated 11 months ago
- Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆132Updated last month
- Quick exploration into fine tuning florence 2☆316Updated 8 months ago
- Official repo of Griffon series including v1(ECCV 2024), v2, and G☆212Updated last week
- Codebase for the Recognize Anything Model (RAM)☆79Updated last year
- ComfyUI YOLO-World Integration☆42Updated 11 months ago
- ☆177Updated 7 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆159Updated 5 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆156Updated 8 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 8 months ago
- Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"☆411Updated 2 months ago
- ☆55Updated 6 months ago
- A Simple Framework of Small-scale LMMs for Video Understanding☆65Updated 2 weeks ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆181Updated 5 months ago
- LinVT: Empower Your Image-level Large Language Model to Understand Videos☆77Updated 5 months ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆158Updated 2 weeks ago
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆55Updated last month
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆223Updated 3 months ago
- Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆325Updated 2 months ago
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆87Updated 2 months ago
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"☆143Updated this week
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆192Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆59Updated 3 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆205Updated last week
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆221Updated 11 months ago
- a family of highly capabale yet efficient large multimodal models☆183Updated 9 months ago
- (CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…☆209Updated last month
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection☆170Updated 2 months ago