kjerk / instructblip-pipeline
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
☆30Updated last year
Alternatives and similar repositories for instructblip-pipeline:
Users that are interested in instructblip-pipeline are comparing it to the libraries listed below
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"☆84Updated last year
- ☆30Updated last year
- ☆60Updated 8 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆47Updated this week
- ☆27Updated last year
- Python scripts to use for captioning images with VLMs☆36Updated 5 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated last year
- A multi-modal AI Model that can generate high quality novel videos with text, images, or video clips.☆65Updated last year
- finetune your florence2 model easy☆20Updated 5 months ago
- ☆23Updated 4 months ago
- CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models☆17Updated 4 months ago
- Public code release for the paper "ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation"☆37Updated 2 months ago
- Simple extension for text-generation-webui that injects recent conversation history into the negative prompt with the goal of minimizing …☆33Updated last year
- Unofficial implementation. Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.☆59Updated 10 months ago
- Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance☆67Updated last month
- Scripts for use with LongCLIP, including fine-tuning Long-CLIP☆53Updated 2 months ago
- ☆60Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆106Updated 3 months ago
- sd3 dreambooth lora training book, adapted from the diffusers doc☆42Updated 7 months ago
- ☆55Updated this week
- Let's try and finetune the OpenAI consistency decoder to work for SDXL☆23Updated last year
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆127Updated 11 months ago
- A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.☆58Updated 2 months ago
- ☆18Updated 4 months ago
- Implementation of the premier Text to Video model from OpenAI☆57Updated 2 months ago
- ☆35Updated 9 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆126Updated 7 months ago
- [CVPR2024] The official implementation of paper Relation Rectification in Diffusion Model☆45Updated 4 months ago
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆33Updated last month
- ☆16Updated last year