retkowsky / florence-2
Florence-2
☆54Updated this week
Alternatives and similar repositories for florence-2:
Users that are interested in florence-2 are comparing it to the libraries listed below
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆123Updated last month
- Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"☆333Updated this week
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆272Updated 9 months ago
- ☆155Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆143Updated this week
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆151Updated 3 weeks ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆223Updated 5 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆174Updated this week
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆105Updated 5 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆147Updated 3 weeks ago
- ComfyUI YOLO-World Integration☆34Updated 6 months ago
- Quick exploration into fine tuning florence 2☆290Updated 4 months ago
- ☆340Updated 2 months ago
- PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.☆195Updated 7 months ago
- Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-lan…☆23Updated 6 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆107Updated 2 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆130Updated 7 months ago
- OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆45Updated last month
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆128Updated last month
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆60Updated 5 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 4 months ago
- Data release for the ImageInWords (IIW) paper.☆205Updated 2 months ago
- ☆159Updated 6 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆73Updated 2 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆187Updated 6 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆134Updated 2 months ago
- Official repo of Griffon series including v1(ECCV 2024), v2, and G☆127Updated this week
- [ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces☆234Updated last year
- ☆28Updated last month