ANYANTUDRE / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
☆23Updated 6 months ago
Alternatives and similar repositories for Florence-2-Vision-Language-Model:
Users that are interested in Florence-2-Vision-Language-Model are comparing it to the libraries listed below
- OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆45Updated last month
- Florence-2☆54Updated this week
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated 9 months ago
- EdgeSAM model for use with Autodistill.☆26Updated 7 months ago
- ☆34Updated 11 months ago
- ☆28Updated last month
- ☆35Updated 7 months ago
- EfficientSAM + YOLO World base model for use with Autodistill.☆9Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- Stable Diffusion in TensorRT 8.5+☆14Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆60Updated 5 months ago
- Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.☆15Updated last year
- ☆20Updated 3 weeks ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 9 months ago
- Official repo: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing☆51Updated 9 months ago
- Modern Stable Diffusion models family - Fluently☆28Updated 7 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 9 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆61Updated 2 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆24Updated last year
- Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch☆30Updated 2 months ago
- ☆47Updated last month
- ☆10Updated 2 weeks ago
- ☆29Updated last month
- ☆32Updated 7 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆41Updated 5 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆66Updated 8 months ago
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆46Updated 9 months ago
- ComfyUI YOLO-World Integration☆34Updated 6 months ago
- Diffusers training with mmengine☆98Updated 11 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆59Updated 3 months ago