360CVGroup / Inner-Adaptor-ArchitectureLinks

LMM solved catastrophic forgetting, AAAI2025

☆44

Alternatives and similar repositories for Inner-Adaptor-Architecture

Users that are interested in Inner-Adaptor-Architecture are comparing it to the libraries listed below

Sorting:

foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Updated last year
Kevinz-code / SeVa
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
☆59Updated last year
zhijie-group / Orthus
☆64Updated 6 months ago
invictus717 / MiCo
[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆121Updated last year
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 10 months ago
HumanMLLM / ViSpeak
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆41Updated 5 months ago
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 8 months ago
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆40Updated last month
mutonix / Vript
☆156Updated 10 months ago
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆73Updated 2 months ago
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆72Updated last year
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
waltonfuture / RL-with-Cold-Start
SFT+RL boosts multimodal reasoning
☆39Updated 5 months ago
EvolvingLMMs-Lab / VideoMMMU
☆62Updated 3 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
alibaba / conv-llava
☆124Updated last year
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆22Updated last year
shikiw / Modality-Integration-Rate
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…
☆107Updated 5 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆133Updated 6 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago
JaaackHongggg / WorldSense
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆34Updated 3 weeks ago
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆75Updated 2 months ago
BriansIDP / video-SALMONN-o1
☆37Updated 3 months ago
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆105Updated 5 months ago
InternLM / ARM-Thinker
Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
☆40Updated this week
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 9 months ago
path2generalist / General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
☆19Updated 4 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆80Updated 4 months ago