mbzuai-oryx/EvoLMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mbzuai-oryx/EvoLMM)

mbzuai-oryx / EvoLMM

Self Evolving Large Multimodal Models with Continuous Rewards

☆25

Alternatives and similar repositories for EvoLMM

Users that are interested in EvoLMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bruno686 / VisPlay
View on GitHub
[CVPR'26] VisPlay: Self-Evolving Vision-Language Models
☆65Feb 25, 2026Updated 5 months ago
Amshaker / MAVOS
View on GitHub
[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory
☆61Feb 28, 2025Updated last year
amandpkr / GMNR
View on GitHub
(ICCV 2023) Generative Multiplane Neural Radiance for 3D Aware Image Generation.
☆18Sep 28, 2023Updated 2 years ago
umair1221 / WorldCache
View on GitHub
WorldCache: Content-Aware Caching for Accelerated Video World Models
☆23Jun 28, 2026Updated last month
HL-hanlin / Bifrost-1
View on GitHub
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
☆47Nov 24, 2025Updated 8 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
genmilab / MedMO
View on GitHub
MedMO: Medical Foundation Model
☆27Apr 8, 2026Updated 3 months ago
mbzuai-oryx / AIN
View on GitHub
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…
☆55Mar 13, 2025Updated last year
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
Amshaker / Mobile-O
View on GitHub
[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
☆154Apr 13, 2026Updated 3 months ago
OmkarThawakar / Self-Learning-Robot
View on GitHub
Reinforcement Training of Robot
☆11Dec 1, 2019Updated 6 years ago
zli12321 / Vision-SR1
View on GitHub
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
☆175Mar 14, 2026Updated 4 months ago
vis-nlp / OpenCQA
View on GitHub
☆13Jun 20, 2023Updated 3 years ago
ShahinaKK / LG_SDG
View on GitHub
Language Grounded Single Source Domain Generalization in Medical Image Segmentation [ISBI2024]
☆33Oct 27, 2024Updated last year
mbzuai-oryx / ClimateGPT
View on GitHub
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabi…
☆79Sep 24, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
YunxinLi / Multimodal-Context-Reasoning
View on GitHub
A multimodal context reasoning approach that introduce the multi-view semantic alignment information via prefix tuning.
☆15Sep 14, 2023Updated 2 years ago
microsoft / MM-WebAgent
View on GitHub
Build coherent and visually polished multimodal webpages with hierarchical planning, AIGC tools, and iterative reflection.
☆15May 17, 2026Updated 2 months ago
mbzuai-oryx / Camel-Bench
View on GitHub
[NAACL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
☆38Apr 17, 2025Updated last year
yichengchen24 / DataChef
View on GitHub
☆25Feb 12, 2026Updated 5 months ago
Weifeng2Wu / ICDAR-2023-DTT-in-Images-1
View on GitHub
☆12Mar 20, 2023Updated 3 years ago
Lucanyc / VISTA-Gym
View on GitHub
☆27Mar 17, 2026Updated 4 months ago
mbzuai-oryx / Agent-X
View on GitHub
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆43Apr 28, 2026Updated 3 months ago
iabh1shekbasu / CalibPrompt
View on GitHub
[BMVC 2025 🔥] CalibPrompt is the first framework that enhances Med-VLM calibration during prompt tuning.
☆16Jul 13, 2026Updated 2 weeks ago
ZJU-REAL / GUI-RCPO
View on GitHub
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
☆67Nov 8, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Muzammal-Naseer / SAT
View on GitHub
Official repository for "Stylized Adversarial Training" (TPAMI 2022)
☆11Dec 30, 2022Updated 3 years ago
w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Jul 20, 2026Updated last week
huggingface / docmatix
View on GitHub
A huge dataset for Document Visual Question Answering
☆24Jul 29, 2024Updated 2 years ago
tajwarfahim / srt
View on GitHub
Official implementation for the paper "Can Large Reasoning Models Self-Train?"
☆76Jul 9, 2026Updated 2 weeks ago
AIGeeksGroup / PresentAgent-2
View on GitHub
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
☆17Jun 5, 2026Updated last month
Jasper-Yan / SCRL
View on GitHub
[ACL'26] Official Repository for The Paper: What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time
☆15Apr 7, 2026Updated 3 months ago
HKU-MMLab / Math-VR-CodePlot-CoT
View on GitHub
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
☆63Nov 4, 2025Updated 8 months ago
apple / ml-mebp
View on GitHub
☆39Oct 29, 2025Updated 9 months ago
xinwong / TAPT
View on GitHub
[CVPR 2025] TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
☆15May 21, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TencentARC / pi-Tuning
View on GitHub
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Jul 21, 2023Updated 3 years ago
he-nantian / ReDiffuser
View on GitHub
ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation
☆15Jun 2, 2024Updated 2 years ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆43Apr 10, 2025Updated last year
mbzuai-oryx / LlamaV-o1
View on GitHub
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307May 21, 2025Updated last year
EvolvingLMMs-Lab / sae
View on GitHub
A framework that allows you to apply Sparse AutoEncoder on any models
☆53Jul 11, 2025Updated last year
OmkarThawakar / composed-video-retrieval
View on GitHub
Composed Video Retrieval
☆62May 2, 2024Updated 2 years ago
xiemk / SSMLL-CAP
View on GitHub
☆12Mar 7, 2024Updated 2 years ago