emanuelevivoli / ComiCap
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
☆12Updated 5 months ago
Alternatives and similar repositories for ComiCap:
Users that are interested in ComiCap are comparing it to the libraries listed below
- Comics Dataset Framework for Comics Understanding☆17Updated last month
- ☆68Updated 10 months ago
- The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"☆109Updated 3 months ago
- An open source implementation of CLIP (With TULIP Support)☆128Updated last month
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆42Updated 6 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆50Updated 5 months ago
- An open-source implementaion for fine-tuning SmolVLM.☆24Updated 3 weeks ago
- ☆44Updated 2 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆121Updated last week
- ☆30Updated 6 months ago
- ☆63Updated 6 months ago
- A Gradio component that can be used to annotate images with bounding boxes.☆49Updated last month
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆242Updated 3 months ago
- Official PyTorch implementation of TokenSet.☆114Updated last month
- ☆64Updated last year
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆88Updated last year
- Data release for the ImageInWords (IIW) paper.☆209Updated 5 months ago
- Matryoshka Multimodal Models☆99Updated 3 months ago
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆76Updated 11 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆175Updated 3 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆129Updated 10 months ago
- ☆11Updated 10 months ago
- ☆76Updated 6 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆102Updated 10 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 2 months ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆73Updated 4 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆51Updated 2 weeks ago
- LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆128Updated 3 weeks ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆87Updated 2 weeks ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆77Updated 11 months ago