emanuelevivoli / ComiCapLinks
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆14Updated last year
Alternatives and similar repositories for ComiCap
Users that are interested in ComiCap are comparing it to the libraries listed below
Sorting:
- Comics Dataset Framework for Comics Understanding☆39Updated 5 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆134Updated last year
- Data release for the ImageInWords (IIW) paper.☆224Updated last year
- An open source implementation of CLIP (With TULIP Support)☆165Updated 8 months ago
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆416Updated 3 weeks ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆90Updated 2 weeks ago
- Matryoshka Multimodal Models☆122Updated last year
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆83Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆185Updated last year
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Updated last year
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆250Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]☆177Updated 2 months ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆16Updated last year
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆158Updated 6 months ago
- When do we not need larger vision models?☆412Updated last year
- Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]☆406Updated 8 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Updated 2 months ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆104Updated last year
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆298Updated 11 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 6 months ago
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆413Updated last month
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆320Updated last year
- LLaVA-Interactive-Demo☆380Updated last year
- AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more samp…☆310Updated last year
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆279Updated last year
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆482Updated last year
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆471Updated 2 years ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆47Updated last year
- Open reproduction of MUSE for fast text2image generation.☆359Updated last year