kjerk / instructblip-pipeline
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
☆30Updated last year
Alternatives and similar repositories for instructblip-pipeline:
Users that are interested in instructblip-pipeline are comparing it to the libraries listed below
- finetune your florence2 model easy☆20Updated 9 months ago
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"☆86Updated last year
- CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models☆20Updated 7 months ago
- ☆30Updated last year
- ☆60Updated last year
- Python scripts to use for captioning images with VLMs☆39Updated this week
- Scripts for use with LongCLIP, including fine-tuning Long-CLIP☆60Updated last month
- [CVPR2024] The official implementation of paper Relation Rectification in Diffusion Model☆47Updated 7 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated last month
- Unofficial implementation. Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.☆63Updated last year
- Official implementation of UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified …☆68Updated 4 months ago
- Public code release for the paper "ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation"☆37Updated 5 months ago
- Merge safetensor files using the technique described in "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a…☆77Updated 6 months ago
- ☆90Updated last year
- ☆24Updated last year
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- ☆127Updated 6 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. (ICLR 2024)☆168Updated last week
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆129Updated 10 months ago
- Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)☆23Updated 11 months ago
- Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance☆73Updated 2 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated 2 months ago
- ☆171Updated last year
- [ECCVW 2024] Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models☆28Updated 3 months ago
- ☆27Updated last year
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆134Updated last year
- BLIP2 captioning tool as an extension of AUTOMATIC's WebUI☆60Updated 2 years ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆87Updated 4 months ago
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆35Updated 2 months ago
- ☆13Updated 7 months ago