markywg / transagentLinks

[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

☆24

Alternatives and similar repositories for transagent

Users that are interested in transagent are comparing it to the libraries listed below

Sorting:

OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆61Updated 3 months ago
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆162Updated last month
om-ai-lab / ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆61Updated 2 months ago
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆51Updated 3 months ago
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆26Updated 11 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆59Updated last year
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆61Updated 9 months ago
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 7 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 9 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆89Updated last year
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆70Updated 2 weeks ago
NROwind / OpenGPT-4o-Image
A Comprehensive Dataset for Advanced Image Generation and Editing}
☆29Updated last month
zhijie-group / UniCMs
☆39Updated 6 months ago
Jingfeng0705 / LIFT
The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders
☆37Updated 5 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆91Updated 5 months ago
waltonfuture / RL-with-Cold-Start
SFT+RL boosts multimodal reasoning
☆37Updated 4 months ago
TencentARC / GRPO-CARE
☆78Updated 4 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆90Updated 7 months ago
deep-spin / Infinite-Video
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆19Updated 9 months ago
TencentARC / SEED-Bench-R1
☆94Updated 4 months ago
inst-it / inst-it
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆37Updated 9 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆77Updated 4 months ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆92Updated 4 months ago
JaaackHongggg / WorldSense
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆33Updated last month
inclusionAI / M2-Reasoning
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆46Updated 4 months ago
si0wang / VisVM
☆45Updated 10 months ago
alibaba / conv-llava
☆123Updated last year
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
isekai-portal / Link-Context-Learning
☆100Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago