emanuelevivoli / ComiCapLinks
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆14Updated last year
Alternatives and similar repositories for ComiCap
Users that are interested in ComiCap are comparing it to the libraries listed below
Sorting:
- Comics Dataset Framework for Comics Understanding☆32Updated 2 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆129Updated 10 months ago
- Data release for the ImageInWords (IIW) paper.☆222Updated last year
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆246Updated 10 months ago
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆352Updated last week
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆143Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]☆173Updated last month
- Open reproduction of MUSE for fast text2image generation.☆355Updated last year
- An open source implementation of CLIP (With TULIP Support)☆163Updated 6 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Updated 11 months ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆284Updated 9 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆182Updated last year
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆479Updated last year
- Matryoshka Multimodal Models☆115Updated 10 months ago
- Code for instruction-tuning Stable Diffusion.☆245Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆219Updated last month
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆237Updated last year
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆15Updated last year
- ☆65Updated 2 years ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆83Updated 3 months ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆100Updated 11 months ago
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆155Updated 3 months ago
- Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.☆80Updated last year
- M4 experiment logbook☆57Updated 2 years ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆320Updated last year
- Densely Captioned Images (DCI) dataset repository.☆192Updated last year
- Easily compute clip embeddings from video frames☆147Updated 2 years ago
- Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]☆398Updated 5 months ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆470Updated last year