foundation-multimodal-models/CAPTURE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/foundation-multimodal-models/CAPTURE)

foundation-multimodal-models / CAPTURE

☆86

Alternatives and similar repositories for CAPTURE

Users that are interested in CAPTURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
ypwang61 / negCLIPLoss_NormSim
View on GitHub
[NeurIPS 2024 Spotlight] CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning.
☆14Dec 12, 2024Updated last year
foundation-multimodal-models / ConBench
View on GitHub
[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
☆39Oct 23, 2024Updated last year
rangarodrigo / EN1060Lectures
View on GitHub
EN1060 lectures
☆11Jan 25, 2026Updated 5 months ago
sterzhang / image-textualization
View on GitHub
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆172Jul 30, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago
Naman-Choudhary-AI-ML / RLFinanceTimeSeries-OneNet
View on GitHub
We are developing a time series forecasting model using reinforcement learning, based on OneNet, for stock market data prediction.
☆10Apr 19, 2024Updated 2 years ago
zhuang-li / FactualSceneGraph
View on GitHub
[ACL 2023 Findings] FACTUAL dataset, the textual scene graph parser trained on FACTUAL.
☆131Jun 15, 2026Updated last month
LuFan31 / CompreCap
View on GitHub
CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
☆39Mar 21, 2025Updated last year
mtanti / rnn-role
View on GitHub
Code used by the paper "What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?".
☆14Sep 25, 2017Updated 8 years ago
lcyfrank / PlateGen
View on GitHub
Generate images of Chinese license plates
☆11Feb 8, 2021Updated 5 years ago
yychai74 / Generative-MultiEmo
View on GitHub
Source code for NLPCC 2022 paper: Prompt-Based Generative Multi-label Emotion Prediction with Label Contrastive Learning
☆23Jul 5, 2023Updated 3 years ago
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yuhui-zh15 / VLMClassifier
View on GitHub
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆98Oct 19, 2024Updated last year
Yuqifan1117 / HalluciDoctor
View on GitHub
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆52Jul 16, 2024Updated 2 years ago
JierunChen / Ref-L4
View on GitHub
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆61Dec 28, 2024Updated last year
Kangningthu / SUM
View on GitHub
Uncertainty-aware Fine-tuning of Segmentation Foundation Models (NeurIPS 2024).
☆16Jan 9, 2025Updated last year
UCSC-VLAA / Recap-DataComp-1B
View on GitHub
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆152Jun 13, 2024Updated 2 years ago
bytedance / UniVR
View on GitHub
☆24Jul 16, 2026Updated last week
thu-spmi / RAG-CoT
View on GitHub
Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"
☆18Jul 27, 2024Updated last year
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
lucasjinreal / wnnx_models
View on GitHub
Various test models in WNNX format. It can view with `pip install wnetron && wnetron`
☆12Jun 22, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
penghao-wu / visual_jigsaw
View on GitHub
☆78Apr 9, 2026Updated 3 months ago
MeriamAffes / Forecasting-time-series-using-RL
View on GitHub
we implemented a model to predict the market price of a nonlinear chaotic time series,using reinforcement learning
☆17Dec 4, 2018Updated 7 years ago
Letian2003 / MM_INF
View on GitHub
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…
☆40Jun 4, 2025Updated last year
junyangwang0410 / AMBER
View on GitHub
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
☆173Jan 15, 2024Updated 2 years ago
tiangeluo / DiffuRank
View on GitHub
View Selection for 3D Captioning via Diffusion Ranking
☆34Jul 3, 2025Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
ChenShawn / MultiModal-Jupyter-Sandbox
View on GitHub
Simple code sandbox supporting jupyter notebook style code execution. Used for agent training
☆24Dec 5, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆255Apr 3, 2024Updated 2 years ago
WildVision-AI / WildVision-Bench
View on GitHub
☆17Oct 21, 2024Updated last year
frankaging / Interchange-Intervention-Training
View on GitHub
The codebase for Inducing Causal Structure for Interpretable Neural Networks
☆11Dec 3, 2021Updated 4 years ago
wannature / Detective-A-Dynamic-Integrated-Uncertainty-Valuation-Framework
View on GitHub
Pytorch implementation of Detective
☆13Jul 11, 2024Updated 2 years ago
bcdnlp / FAITHSCORE
View on GitHub
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆34Nov 27, 2025Updated 7 months ago
Hesse73 / RLVR-Directions
View on GitHub
Source Code for our ICLR'26 paper
☆17Feb 22, 2026Updated 5 months ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago