ANYANTUDRE / Florence-2-Vision-Language-Model
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
☆13Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for Florence-2-Vision-Language-Model
- EdgeSAM model for use with Autodistill.☆25Updated 5 months ago
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- Code for the paper "Manipulating Embeddings of Stable Diffusion Prompts".☆11Updated 3 months ago
- [CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"☆60Updated 6 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆34Updated 2 weeks ago
- ☆33Updated 10 months ago
- ImageSlider custom component for gradio.☆30Updated 6 months ago
- A simple demo for utilizing grounding dino and segment anything v2 models together☆16Updated 3 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 6 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- [CVPR 2024] Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models☆64Updated 7 months ago
- Modern Stable Diffusion models family - Fluently☆26Updated 5 months ago
- Official repo: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing☆50Updated 7 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated last week
- The official repo of continuous speculative decoding☆16Updated this week
- ☆28Updated 10 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆26Updated this week
- Florence-2☆42Updated 5 months ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆22Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆59Updated 3 months ago
- Image/Instance Retrieval using CLIP, A self supervised Learning Model☆23Updated last year
- Recaption large (Web)Datasets with vllm and save the artifacts.☆30Updated 2 months ago
- SAM-CLIP module for use with Autodistill.☆12Updated last year
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Updated last week
- Implementation of the tryOnDiffusion paper☆19Updated last year
- ☆56Updated 2 weeks ago
- Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.☆14Updated 11 months ago
- Codebase for the Recognize Anything Model (RAM)☆64Updated 11 months ago