retkowsky / florence-2
Florence-2
☆38Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for florence-2
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆89Updated 3 months ago
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024☆261Updated 6 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆127Updated 5 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated this week
- VimTS: A Unified Video and Image Text Spotter☆72Updated 4 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆59Updated 2 months ago
- Quick exploration into fine tuning florence 2☆267Updated last month
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆53Updated 4 months ago
- ☆178Updated last week
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated last week
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- ☆66Updated 6 months ago
- Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"☆306Updated 3 weeks ago
- ☆258Updated this week
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- 🔥🔥First-ever hour scale video understanding models☆150Updated last week
- Codebase for the Recognize Anything Model (RAM)☆63Updated 10 months ago
- Fine-tuning code for CLIP models☆156Updated 2 weeks ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆58Updated 2 weeks ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆85Updated 2 months ago
- ☆21Updated 5 months ago
- Image Prompter for Gradio☆73Updated 10 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆140Updated 3 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 2 weeks ago
- ☆152Updated 4 months ago
- Long Context Transfer from Language to Vision☆328Updated 2 weeks ago
- Official code for infimm-hd☆15Updated 2 months ago
- Data release for the ImageInWords (IIW) paper.☆200Updated 5 months ago