emanuelevivoli / ComiCapLinks
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆14Updated last year
Alternatives and similar repositories for ComiCap
Users that are interested in ComiCap are comparing it to the libraries listed below
Sorting:
- Comics Dataset Framework for Comics Understanding☆38Updated 5 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆133Updated last year
- Data release for the ImageInWords (IIW) paper.☆224Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]☆176Updated last month
- An open source implementation of CLIP (With TULIP Support)☆165Updated 8 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆249Updated last year
- Matryoshka Multimodal Models☆121Updated last year
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Updated last year
- Open reproduction of MUSE for fast text2image generation.☆359Updated last year
- When do we not need larger vision models?☆412Updated 11 months ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆16Updated last year
- Densely Captioned Images (DCI) dataset repository.☆195Updated last year
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆83Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆320Updated last year
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆158Updated 5 months ago
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆414Updated 2 weeks ago
- ☆576Updated last year
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆129Updated 2 months ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆99Updated last year
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆192Updated 2 years ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆90Updated last week
- ☆180Updated 2 months ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.☆356Updated 6 months ago
- ☆192Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆185Updated last year
- Implementation of Key-Locked Rank One Editing, from Nvidia AI☆237Updated 2 years ago
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆481Updated last year
- DataComp: In search of the next generation of multimodal datasets☆767Updated 9 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆47Updated last year
- E5-V: Universal Embeddings with Multimodal Large Language Models☆272Updated last month