emanuelevivoli / ComiCapLinks
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆14Updated last year
Alternatives and similar repositories for ComiCap
Users that are interested in ComiCap are comparing it to the libraries listed below
Sorting:
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆133Updated last year
- Comics Dataset Framework for Comics Understanding☆34Updated 4 months ago
- Data release for the ImageInWords (IIW) paper.☆224Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]☆175Updated last month
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆248Updated 11 months ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆297Updated 10 months ago
- Implementation of Key-Locked Rank One Editing, from Nvidia AI☆237Updated 2 years ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆147Updated last year
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆82Updated last year
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆160Updated last year
- Train VAE like a boss☆311Updated last year
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆480Updated last year
- Open reproduction of MUSE for fast text2image generation.☆359Updated last year
- ☆65Updated 2 years ago
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆407Updated last month
- DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual A…☆574Updated last month
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆102Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆185Updated last year
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆90Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆320Updated last year
- E5-V: Universal Embeddings with Multimodal Large Language Models☆273Updated last month
- Matryoshka Multimodal Models☆121Updated 11 months ago
- Code for instruction-tuning Stable Diffusion.☆247Updated last year
- Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch☆281Updated last year
- Huggingface-compatible SDXL Unet implementation that is readily hackable☆435Updated 2 years ago
- ☆48Updated 10 months ago
- When do we not need larger vision models?☆413Updated 11 months ago
- Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"☆403Updated 9 months ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆15Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 5 months ago