microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,481Updated this week
Alternatives and similar repositories for Magma:
Users that are interested in Magma are comparing it to the libraries listed below
- Witness the aha moment of VLM with less than $3.☆3,376Updated 3 weeks ago
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,910Updated 2 months ago
- [ICLR 2025] Agent S: an open agentic framework that uses computers like a human☆1,356Updated this week
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆1,127Updated 2 weeks ago
- ☆743Updated this week
- An open-sourced end-to-end VLM-based GUI Agent☆837Updated last month
- An Open Large Reasoning Model for Real-World Solutions☆1,475Updated 3 weeks ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,382Updated last week
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,139Updated 3 months ago
- Frontier Multimodal Foundation Models for Image and Video Understanding☆664Updated this week
- Search-o1: Agentic Search-Enhanced Large Reasoning Models☆735Updated 3 weeks ago
- OctoTools: An agentic framework with extensible tools for complex reasoning☆988Updated last week
- Fully open data curation for reasoning models☆1,576Updated last week
- ☆3,340Updated last month
- Solve Visual Understanding with Reinforced VLMs☆4,305Updated this week
- A live stream development of RL tunning for LLM agents☆1,883Updated this week
- Everything about the SmolLM2 and SmolVLM family of models☆2,049Updated this week
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,013Updated last month
- ☆1,348Updated 4 months ago
- Out-of-the-box (OOTB) GUI Agent for Windows and macOS☆1,444Updated 2 weeks ago
- Code for the Molmo Vision-Language Model☆339Updated 3 months ago
- Vision agent☆4,420Updated this week
- free and open OpenAI Deep Research☆480Updated last month
- Codebase for Aria - an Open Multimodal Native MoE☆1,025Updated 2 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,149Updated this week
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆2,051Updated 3 weeks ago
- A mini, open-weights, version of our Proxy assistant.☆802Updated last month
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆1,389Updated this week
- Democratizing Reinforcement Learning for LLMs☆2,113Updated last month
- ☆2,481Updated this week