ANYANTUDRE / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
☆26Updated 7 months ago
Alternatives and similar repositories for Florence-2-Vision-Language-Model:
Users that are interested in Florence-2-Vision-Language-Model are comparing it to the libraries listed below
- ☆12Updated last month
- EdgeSAM model for use with Autodistill.☆26Updated 8 months ago
- Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.☆15Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 7 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 10 months ago
- The official repo of continuous speculative decoding☆24Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 6 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆67Updated 9 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆61Updated 6 months ago
- Florence-2☆59Updated last week
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆49Updated 3 weeks ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆47Updated 10 months ago
- ☆23Updated 2 months ago
- Code for the paper "Manipulating Embeddings of Stable Diffusion Prompts".☆12Updated 6 months ago
- ☆29Updated last month
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 4 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago
- SAM-CLIP module for use with Autodistill.☆13Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 5 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 6 months ago
- Stable Diffusion in TensorRT 8.5+☆14Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆47Updated last week
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆81Updated 7 months ago
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- ComfyUI YOLO-World Integration☆38Updated 7 months ago
- This repository is for the first survey on SAM for videos.☆32Updated 3 weeks ago