ZrrSkywalker/MAVIS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZrrSkywalker/MAVIS)

ZrrSkywalker / MAVIS

[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models

☆156

Alternatives and similar repositories for MAVIS

Users that are interested in MAVIS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZrrSkywalker / MathVerse
View on GitHub
[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
☆183Apr 28, 2025Updated last year
pengshuai-rin / MultiMath
View on GitHub
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
☆33Jan 22, 2025Updated last year
HZQ950419 / Math-LLaVA
View on GitHub
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆91Jun 28, 2024Updated 2 years ago
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
InternScience / TrustGeoGen
View on GitHub
Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"
☆23Sep 1, 2025Updated 10 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
dle666 / R-CoT
View on GitHub
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
☆217Nov 4, 2024Updated last year
Ucas-HaoranWei / Slow-Perception
View on GitHub
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆161Jul 28, 2025Updated 11 months ago
ZiyuGuo99 / SAM2Point
View on GitHub
The Most Faithful Implementation of Segment Anything (SAM) in 3D
☆359Sep 11, 2024Updated last year
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆126Nov 25, 2024Updated last year
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
pipilurj / G-LLaVA
View on GitHub
Official github repo of G-LLaVA
☆154Feb 20, 2025Updated last year
DynaMath / DynaMath
View on GitHub
A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
☆30Nov 25, 2024Updated last year
ECNU-ICALK / EduChat-Math
View on GitHub
[MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
☆57Oct 20, 2024Updated last year
njucckevin / MM-Self-Improve
View on GitHub
A Self-Training Framework for Vision-Language Reasoning
☆90Jan 23, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
InfiMM / Awesome-Multimodal-LLM-for-Math-STEM
View on GitHub
Paper collections of multi-modal LLM for Math/STEM/Code.
☆145May 17, 2026Updated 2 months ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
shilinyan99 / CrossLMM
View on GitHub
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
☆25Dec 21, 2025Updated 7 months ago
lupantech / MathVista
View on GitHub
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
☆367Sep 29, 2025Updated 9 months ago
ZiyuGuo99 / Thinking-while-Generating
View on GitHub
The first Interleaved framework for textual reasoning within the visual generation process
☆164Mar 16, 2026Updated 4 months ago
euclid-multimodal / Euclid
View on GitHub
☆18Jan 9, 2025Updated last year
vjmap / vjmap3d-playground
View on GitHub
此工程为唯杰地图 VJMAP3D 示例的所有源代码。唯杰地图3D VJMAP3D是一款基于threejs开发的三维可视化引擎框架。通过VJMAP3D提供的丰富的功能，可以在浏览器中创建出绚丽的3D可视化应用。该框架既可做为一个单独的3D引擎用于数据可视化、产品展示、数字…
☆48Mar 11, 2026Updated 4 months ago
elleryqueenhomels / AI_for_Atari
View on GitHub
Deep Reinforcement Learning Algorithms for solving Atari 2600 Games
☆143Mar 23, 2023Updated 3 years ago
InternScience / GeoX
View on GitHub
[ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
☆49Jan 25, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago
shilinyan99 / PanoVOS
View on GitHub
「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
☆21Jul 2, 2024Updated 2 years ago
Lucky-Lance / SPP
View on GitHub
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆22May 28, 2024Updated 2 years ago
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
ShuaiLyu0110 / SQL-o1
View on GitHub
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
☆197May 23, 2025Updated last year
FoundationVision / Groma
View on GitHub
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆585Jun 7, 2024Updated 2 years ago
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,331Updated this week
mingliangzhang2018 / PGPS
View on GitHub
The implement of geometric solver PGPSNet
☆30Jul 8, 2026Updated 2 weeks ago
uvd / sui-swap-course
View on GitHub
☆43Dec 21, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
mathllm / MATH-V
View on GitHub
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆139May 16, 2025Updated last year
FractonProtocol / FractonV1
View on GitHub
☆153Jul 28, 2022Updated 3 years ago
OGtwelve / OGTwelveUtilPack
View on GitHub
OGtwelve's util pack: contains many different util might used in real life develop situation
☆111Dec 30, 2023Updated 2 years ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
SiyangLi99 / open-alteryx-macro
View on GitHub
Welcome to the 'Open-Alteryx-Macro' project. This project is aimed at providing an open-source solution for managing and updating Alteryx…
☆156May 25, 2024Updated 2 years ago
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
libdriver / nrf905
View on GitHub
NRF905 full-featured driver library for general-purpose MCU and Linux.
☆78Jun 24, 2026Updated last month