emanuelevivoli / ComiCapLinks
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆14Updated last year
Alternatives and similar repositories for ComiCap
Users that are interested in ComiCap are comparing it to the libraries listed below
Sorting:
- Comics Dataset Framework for Comics Understanding☆33Updated 3 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆131Updated 11 months ago
- Data release for the ImageInWords (IIW) paper.☆223Updated last year
- An open source implementation of CLIP (With TULIP Support)☆163Updated 7 months ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆15Updated last year
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆157Updated 4 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]☆175Updated last week
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆354Updated last month
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆82Updated last year
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆143Updated last year
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆47Updated last year
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆248Updated 10 months ago
- Matryoshka Multimodal Models☆120Updated 10 months ago
- DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual A…☆563Updated 3 weeks ago
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆479Updated last year
- [IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation☆335Updated 6 months ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆89Updated 11 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆184Updated last year
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆102Updated last year
- ☆69Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆220Updated last month
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆129Updated last month
- Official repository for the MMFM challenge☆25Updated last year
- Official PyTorch implementation of the WACV 2025 Oral paper "Composed Image Retrieval for Training-FREE DOMain Conversion".☆45Updated 3 months ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆92Updated last year
- When do we not need larger vision models?☆412Updated 10 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Updated last year
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆188Updated 2 years ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆292Updated 9 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆273Updated this week