Ucas-HaoranWei/Slow-Perception

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ucas-HaoranWei/Slow-Perception)

Ucas-HaoranWei / Slow-Perception

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

☆163

Alternatives and similar repositories for Slow-Perception

Users that are interested in Slow-Perception are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

felixludos / alphageometry
View on GitHub
☆13Oct 10, 2024Updated last year
Ucas-HaoranWei / Vary-family
View on GitHub
☆57Jan 23, 2024Updated 2 years ago
Ucas-HaoranWei / Vary-tiny-600k
View on GitHub
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆89Sep 21, 2024Updated last year
LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆265Apr 14, 2025Updated last year
Jayce-Ping / AutoGPS
View on GitHub
Code for paper *AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning*
☆17Jul 19, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mingliangzhang2018 / PGDP
View on GitHub
The first end-to-end deep learning model for explicit plane geometry diagram parsing.
☆59Jun 3, 2026Updated last month
ZrrSkywalker / MAVIS
View on GitHub
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆156Dec 5, 2024Updated last year
dle666 / R-CoT
View on GitHub
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
☆216Nov 4, 2024Updated last year
euclid-multimodal / Euclid
View on GitHub
☆18Jan 9, 2025Updated last year
InternScience / GeoX
View on GitHub
[ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
☆49Jan 25, 2025Updated last year
RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆110May 27, 2025Updated last year
intervention-training / int
View on GitHub
☆16Feb 4, 2026Updated 5 months ago
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
vayvi / HDV
View on GitHub
Historical Diagram Vectorization
☆20Nov 25, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
DAMO-NLP-SG / multimodal_textbook
View on GitHub
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆196Mar 17, 2025Updated last year
BitSecret / HyperGNet
View on GitHub
Geometric Problem Solving Integrating FormalGeo Symbolic System and Hypergraph Neural Network.
☆16Sep 23, 2025Updated 10 months ago
dle666 / GeoFocus
View on GitHub
☆27Jul 5, 2026Updated 3 weeks ago
jihaonew / MM-Instruct
View on GitHub
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated 2 years ago
zezeze97 / DFE-GPS
View on GitHub
☆14Jul 15, 2025Updated last year
shiwk24 / MathCanvas
View on GitHub
This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"
☆80Apr 14, 2026Updated 3 months ago
InfiMM / Awesome-Multimodal-LLM-for-Math-STEM
View on GitHub
Paper collections of multi-modal LLM for Math/STEM/Code.
☆145May 17, 2026Updated 2 months ago
1694439208 / GOT-OCR-Inference
View on GitHub
研究GOT-OCR-项目落地加速，不限语言
☆62Oct 24, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zai-org / CogCoM
View on GitHub
☆222Jul 5, 2024Updated 2 years ago
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,890Dec 30, 2024Updated last year
chengruogu0915 / GeoUni
View on GitHub
Repository for GeoUni, A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions.
☆23Jun 12, 2025Updated last year
pipilurj / G-LLaVA
View on GitHub
Official github repo of G-LLaVA
☆154Feb 20, 2025Updated last year
Ruiyang-061X / Awesome-MLLM-Reasoning
View on GitHub
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
☆13Feb 7, 2025Updated last year
njucckevin / MM-Self-Improve
View on GitHub
A Self-Training Framework for Vision-Language Reasoning
☆90Jan 23, 2025Updated last year
lupantech / InterGPS
View on GitHub
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"
☆178Mar 29, 2025Updated last year
Fancy-MLLM / R1-Onevision
View on GitHub
R1-onevision, a visual language model capable of deep CoT reasoning.
☆581Apr 13, 2025Updated last year
Open-Reasoner-Zero / Open-Vision-Reasoner
View on GitHub
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆157Sep 12, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HZQ950419 / Math-LLaVA
View on GitHub
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆91Jun 28, 2024Updated 2 years ago
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,208Feb 10, 2025Updated last year
tpgh24 / ag4masses
View on GitHub
Making Google Deepmind's AlphaGeometry accessible to the Masses
☆63Jan 9, 2025Updated last year
LINs-lab / GMem
View on GitHub
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆43Mar 11, 2025Updated last year
ChenShawn / MultiModal-Jupyter-Sandbox
View on GitHub
Simple code sandbox supporting jupyter notebook style code execution. Used for agent training
☆25Dec 5, 2025Updated 7 months ago
NeuraSearch / Geometry-Diagram-Description
View on GitHub
☆15Jul 22, 2024Updated 2 years ago
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆331Jan 20, 2025Updated last year