ProGamerGov / VLM-Captioning-ToolsLinks
Python scripts to use for captioning images with VLMs
☆43Updated 4 months ago
Alternatives and similar repositories for VLM-Captioning-Tools
Users that are interested in VLM-Captioning-Tools are comparing it to the libraries listed below
Sorting:
- ☆91Updated last year
- AnimationDiff with train☆122Updated last year
- A Diffusion training toolbox based on diffusers and existing SOTA methods, including Dreambooth, Texual Inversion, LoRA, Custom Diffusion…☆81Updated 11 months ago
- ☆170Updated 10 months ago
- SigLIP-based Aesthetic Score Predictor☆307Updated 9 months ago
- Fine-Grained Subject-Specific Attribute Expression Control in T2I Models☆128Updated 6 months ago
- IP Adapter Instruct☆209Updated last year
- Unofficial implementation. Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.☆65Updated last year
- [ICML 2025] Official PyTorch implementation of paper "Ultra-Resolution Adaptation with Ease".☆106Updated 4 months ago
- Create transparent image with Diffusers!☆58Updated 7 months ago
- InstantUnify: Integrates Multimodal LLM into Diffusion Models 🔥☆40Updated last year
- Consistency Distillation with Target Timestep Selection and Decoupled Guidance☆91Updated 8 months ago
- ☆236Updated last year
- Unofficial extension implementation of CausVid☆57Updated 4 months ago
- Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)☆24Updated last year
- MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models (NeurIPS 2024)☆97Updated 8 months ago
- [ICCV 2025] Code & Data for: SuperEdit - Rectifying and Facilitating Supervision for Instruction-Based Image Editing☆158Updated 2 months ago
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation☆234Updated last year
- PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation☆36Updated 10 months ago
- [ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控…☆128Updated last year
- Implementation of HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models☆175Updated 2 years ago
- Diffusion attentive attribution maps for interpreting Stable Diffusion for image-to-image attention.☆55Updated 8 months ago
- Subjects200K dataset☆118Updated 8 months ago
- Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision" [ICLR2025]☆127Updated 7 months ago
- [CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization☆247Updated 5 months ago
- [ECCV 2024] Official PyTorch implementation of "Getting it Right: Improving Spatial Consistency in Text-to-Image Models"☆100Updated last year
- Textual Inversion for DeepFloyd IF☆59Updated 2 years ago
- ☆102Updated last year
- A detailed diagram laying out the full Flux.1 [dev] architecture as shared by Black Forest Labs at https://github.com/black-forest-labs/f…☆78Updated 11 months ago
- Official PyTorch implementation of paper "CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up".☆210Updated 5 months ago