danaesavi / ImageChainLinks

This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

☆14

Alternatives and similar repositories for ImageChain

Users that are interested in ImageChain are comparing it to the libraries listed below

Sorting:

princeton-pli / VLM_S2H
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
☆15Updated 5 months ago
HanSolo9682 / CounterCurate
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆18Updated last year
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Updated 10 months ago
philippe-eecs / small-vision
A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.
☆34Updated last year
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆21Updated last year
ml-jku / semantic-image-text-alignment
☆24Updated 2 years ago
lzw-lzw / UnifiedMLLM
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Updated last year
locuslab / llava-token-compression
☆44Updated 11 months ago
ZihaoHuang-notabot / Ultra-Sparse-Memory-Network
☆30Updated last month
MarkXCloud / CSpD
The official repo of continuous speculative decoding
☆30Updated 7 months ago
Optimization-AI / FastCLIP
Distributed Optimization Infra for learning CLIP models
☆27Updated last year
Yui010206 / CREMA
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆53Updated 4 months ago
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆61Updated last year
lyan62 / vlm-info-loss
☆20Updated last month
mbzuai-oryx / Agent-X
Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆31Updated this week
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆54Updated 8 months ago
arubique / OCCAM
This is an implementation of the paper "Are We Done with Object-Centric Learning?"
☆11Updated last month
elad-amrani / xtra
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆13Updated 7 months ago
eric-ai-lab / Discffusion
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
☆30Updated last year
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆50Updated 3 months ago
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆77Updated 10 months ago
tianyi-lab / R2-T2
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆16Updated 7 months ago
AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆30Updated 10 months ago
TencentARC / GRPO-CARE
☆76Updated 4 months ago
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Updated 8 months ago
theAdamColton / vq-clip
Train vector quantized CLIP models using pytorch lightning
☆20Updated last year
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 6 months ago
WangFei-2019 / SNARE
Project for SNARE benchmark
☆11Updated last year
qiuzh20 / RMoE
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆24Updated last year