kjerk / instructblip-pipelineLinks
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
☆33Updated 2 years ago
Alternatives and similar repositories for instructblip-pipeline
Users that are interested in instructblip-pipeline are comparing it to the libraries listed below
Sorting:
- SSD-1B, an open-source text-to-image model, outperforming previous versions by being 50% smaller and 60% faster than SDXL.☆179Updated last year
- CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models☆21Updated last year
- Unofficial implementation. Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.☆65Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆232Updated 2 years ago
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"☆86Updated 2 years ago
- This repository implements the idea of "caption upsampling" from DALL-E 3 with Zephyr-7B and gathers results with SDXL.☆158Updated 2 years ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆132Updated last year
- XGEN-MM(BLIP3) Autocaptioning Tools☆17Updated last year
- official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)☆178Updated last year
- Scripts for use with LongCLIP, including fine-tuning Long-CLIP☆63Updated 10 months ago
- ☆208Updated 2 weeks ago
- ☆90Updated 2 years ago
- finetune your florence2 model easy☆21Updated last year
- ☆65Updated 7 months ago
- ☆31Updated 2 years ago
- ☆128Updated 4 months ago
- ☆44Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 11 months ago
- ☆65Updated last year
- [CVPR2024] The official implementation of paper Relation Rectification in Diffusion Model☆48Updated last year
- lightweight LAMA inference wrapper☆26Updated 2 years ago
- ☆61Updated 2 years ago
- Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Research☆52Updated last year
- Data release for the ImageInWords (IIW) paper.☆224Updated last year
- Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.☆199Updated last year
- Implementation of HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models☆175Updated 2 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆143Updated 2 years ago
- [SIGGRAPH Asia 2023] An interactive story visualization tool that support multiple characters☆270Updated last year
- Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance☆75Updated 7 months ago
- A Diffusion training toolbox based on diffusers and existing SOTA methods, including Dreambooth, Texual Inversion, LoRA, Custom Diffusion…☆84Updated last year