microsoft/Magma

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/Magma)

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

☆1,936

Alternatives and similar repositories for Magma

Users that are interested in Magma are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / OmniParser
View on GitHub
A simple screen parsing tool towards pure vision based GUI agent
☆25,190Updated this week
NVlabs / VILA
View on GitHub
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,845Mar 12, 2026Updated 4 months ago
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,014Jul 7, 2026Updated 2 weeks ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,664Jan 30, 2026Updated 5 months ago
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,263Oct 29, 2025Updated 8 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
facebookresearch / perception_models
View on GitHub
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆2,329Apr 13, 2026Updated 3 months ago
NVIDIA / cosmos
View on GitHub
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomou…
☆11,243Updated this week
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
showlab / ShowUI
View on GitHub
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
☆1,887Apr 24, 2026Updated 3 months ago
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,119May 4, 2026Updated 2 months ago
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,102Sep 22, 2025Updated 10 months ago
microsoft / SoM
View on GitHub
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
☆1,549Aug 19, 2024Updated last year
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,065May 19, 2025Updated last year
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,942Aug 12, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Genesis-Embodied-AI / genesis-world
View on GitHub
Simulation platform for general-purpose robotics & embodied AI learning.
☆29,636Updated this week
facebookresearch / vjepa2
View on GitHub
PyTorch code and models for VJEPA2 self-supervised learning from video.
☆4,392Mar 23, 2026Updated 4 months ago
bytedance / UI-TARS
View on GitHub
Pioneering Automated GUI Interaction with Native Agents
☆11,225Jan 27, 2026Updated 5 months ago
facebookresearch / sam2
View on GitHub
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆19,595May 30, 2026Updated last month
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
manycore-research / SpatialLM
View on GitHub
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆4,622Jun 26, 2026Updated 3 weeks ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,989Updated this week
LatentActionPretraining / LAPA
View on GitHub
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆560Jan 22, 2025Updated last year
OpenDriveLab / AgiBot-World
View on GitHub
[IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
☆3,105May 29, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
openvla / openvla
View on GitHub
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆6,699Mar 23, 2025Updated last year
camel-ai / owl
View on GitHub
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
☆20,063Updated this week
simpler-env / SimplerEnv
View on GitHub
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Goo…
☆1,127Dec 20, 2025Updated 7 months ago
deepseek-ai / Janus
View on GitHub
Janus-Series: Unified Multimodal Understanding and Generation Models
☆17,752Feb 1, 2025Updated last year
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,964Jan 8, 2026Updated 6 months ago
Gen-Verse / MMaDA
View on GitHub
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)
☆1,660Feb 14, 2026Updated 5 months ago
Physical-Intelligence / openpi
View on GitHub
☆12,978Jun 16, 2026Updated last month
landing-ai / vision-agent
View on GitHub
This tool has been deprecated. Use Agentic Document Extraction instead.
☆5,290Jan 29, 2026Updated 5 months ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆882Dec 14, 2025Updated 7 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,654Updated this week
microsoft / PIKE-RAG
View on GitHub
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
☆2,476Sep 10, 2025Updated 10 months ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆451Mar 19, 2025Updated last year
UMass-Embodied-AGI / 3D-VLA
View on GitHub
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
☆629Oct 29, 2024Updated last year
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,433Jan 12, 2026Updated 6 months ago