ProGamerGov / VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
☆36Updated 5 months ago
Alternatives and similar repositories for VLM-Captioning-Tools:
Users that are interested in VLM-Captioning-Tools are comparing it to the libraries listed below
- SigLIP-based Aesthetic Score Predictor☆172Updated last month
- AnimationDiff with train☆120Updated 10 months ago
- Fine-Grained Subject-Specific Attribute Expression Control in T2I Models☆112Updated 7 months ago
- Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers☆47Updated 3 months ago
- Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization☆173Updated last month
- A Diffusion training toolbox based on diffusers and existing SOTA methods, including Dreambooth, Texual Inversion, LoRA, Custom Diffusion…☆76Updated 3 months ago
- Subjects200K dataset☆90Updated this week
- ☆80Updated 3 months ago
- [ICLR 2024] Code for FreeNoise based on AnimateDiff☆106Updated 11 months ago
- A retrain of AnimateDiff to be conditional on an init image☆33Updated last year
- ☆32Updated 6 months ago
- ☆90Updated 11 months ago
- Create transparent image with Diffusers!☆48Updated 4 months ago
- [CVPR 2024] Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models☆65Updated 9 months ago
- ☆47Updated 8 months ago
- [NeurIPS 2024] 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching☆141Updated 2 months ago
- ☆77Updated last year
- Official PyTorch implementation of paper "CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up".☆186Updated 2 weeks ago
- [ECCV 2024] Official PyTorch implementation of "Getting it Right: Improving Spatial Consistency in Text-to-Image Models"☆98Updated 6 months ago
- Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision"☆68Updated last month
- 🔥 [CVPR 2024] The official repo for Zero-Painter!☆64Updated 7 months ago
- InstantUnify: Integrates Multimodal LLM into Diffusion Models 🔥☆39Updated 5 months ago
- an unofficial implementation of dreamtuner☆24Updated 10 months ago
- Pytorch Implementation of "SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation"(CVPR 2024)☆100Updated 5 months ago
- MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)☆117Updated 2 months ago
- More suitable IP-Adapter for the DiT architecture☆28Updated 6 months ago
- [SIGGRAPH Asia 2024 (Journal Track)]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter☆213Updated 6 months ago
- ☆129Updated 2 months ago
- ☆90Updated 4 months ago
- InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation 🔥☆81Updated 6 months ago