kjerk / instructblip-pipeline
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
☆30Updated last year
Alternatives and similar repositories for instructblip-pipeline:
Users that are interested in instructblip-pipeline are comparing it to the libraries listed below
- finetune your florence2 model easy☆20Updated 8 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 9 months ago
- CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models☆18Updated 6 months ago
- Official implementation of UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified …☆68Updated 4 months ago
- ☆30Updated last year
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"☆85Updated last year
- This repository implements the idea of "caption upsampling" from DALL-E 3 with Zephyr-7B and gathers results with SDXL.☆152Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- [ECCVW 2024] Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models☆27Updated 2 months ago
- Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance☆73Updated 2 months ago
- Collection of scripts to build small-scale datasets for fine-tuning video generation models.☆51Updated 2 weeks ago
- (IA)^3 for Stable Diffusion☆35Updated last year
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆133Updated 2 months ago
- ☆167Updated last year
- Scripts for use with LongCLIP, including fine-tuning Long-CLIP☆59Updated 3 weeks ago
- Python scripts to use for captioning images with VLMs☆39Updated 8 months ago
- TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder☆53Updated 2 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆225Updated last year
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆102Updated last week
- Official repo for StableLLAVA☆95Updated last year
- Unofficial implementation. Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.☆61Updated last year
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆89Updated 3 weeks ago
- ☆24Updated last year
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated 6 months ago
- Public code release for the paper "ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation"☆37Updated 4 months ago
- Genertaes control vectors for use with llama.cpp in GGUF format.☆19Updated last week
- official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)☆169Updated 7 months ago
- Official implemention of "Make It Count: Text-to-Image Generation with an Accurate Number of Objects" (CVPR 2025)☆69Updated 2 weeks ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated last month