☆82Jul 27, 2024Updated last year
Alternatives and similar repositories for CAPTURE
Users that are interested in CAPTURE are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 Spotlight] CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning.☆14Dec 12, 2024Updated last year
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- Generate images of Chinese license plates☆11Feb 8, 2021Updated 5 years ago
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆171Jul 30, 2024Updated last year
- [ACL 2023 Findings] FACTUAL dataset, the textual scene graph parser trained on FACTUAL.☆126Nov 11, 2025Updated 3 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆31Apr 10, 2024Updated last year
- Uncertainty-aware Fine-tuning of Segmentation Foundation Models (NeurIPS 2024).☆14Jan 9, 2025Updated last year
- ☆13Sep 5, 2023Updated 2 years ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Jun 13, 2024Updated last year
- ☆23Oct 15, 2022Updated 3 years ago
- Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"☆17Jul 27, 2024Updated last year
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆38Mar 21, 2025Updated 11 months ago
- ☆16Oct 21, 2024Updated last year
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated last month
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 3 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆25May 29, 2025Updated 9 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆255Feb 11, 2025Updated last year
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆89Feb 17, 2025Updated last year
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆26May 16, 2025Updated 9 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆262Aug 5, 2025Updated 7 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Sep 14, 2024Updated last year
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆51Oct 31, 2024Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 3 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆945Aug 5, 2025Updated 7 months ago
- A benchmark for testing memorization abilities of LMs☆22Oct 15, 2024Updated last year
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆58Mar 4, 2025Updated last year
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆96Oct 19, 2024Updated last year
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆100Oct 29, 2025Updated 4 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Jul 16, 2024Updated last year
- Official implementation of "OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging".☆43Oct 30, 2025Updated 4 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated last week
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆155Jan 15, 2024Updated 2 years ago