njucckevin/MM-Self-Improve

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/njucckevin/MM-Self-Improve)

njucckevin / MM-Self-Improve

A Self-Training Framework for Vision-Language Reasoning

☆90

Alternatives and similar repositories for MM-Self-Improve

Users that are interested in MM-Self-Improve are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Liac-li / MM-self-improve-qwen2vl
View on GitHub
☆13Dec 9, 2024Updated last year
OS-Copilot / OS-Sentinel
View on GitHub
[ACL 2026] Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic…
☆49Jul 5, 2026Updated 3 weeks ago
njucckevin / CapArena
View on GitHub
An Arena-style Automated Evaluation Benchmark for Detailed Captioning
☆59Jun 1, 2025Updated last year
xufangzhi / Odyssey-Arena
View on GitHub
Extremely Long-Horizon Agentic Tasks Requiring Active Acting and Inductive Reasoning
☆33Feb 9, 2026Updated 5 months ago
TideDra / VL-RLHF
View on GitHub
A RLHF Infrastructure for Vision-Language Models
☆201Nov 15, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
hkust-nlp / mstar
View on GitHub
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆75Jul 13, 2025Updated last year
CONE-MT / MindMerger
View on GitHub
☆32Feb 8, 2025Updated last year
wjn1996 / Chain-of-Knowledge
View on GitHub
☆24Jun 13, 2023Updated 3 years ago
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆325Jun 21, 2025Updated last year
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
OS-Copilot / OS-Symphony
View on GitHub
[ACL 2026 Main] Official repository for paper: OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agents
☆48Apr 7, 2026Updated 3 months ago
LuLuLuyi / LongHeads
View on GitHub
[EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆32Apr 8, 2024Updated 2 years ago
zzli2022 / TLDR
View on GitHub
Code for Research Project TLDR
☆26Jul 28, 2025Updated last year
ZrrSkywalker / MAVIS
View on GitHub
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆156Dec 5, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
njucckevin / OpenMobile-Code
View on GitHub
The model, data and code for OpenMobile
☆50Jul 9, 2026Updated 2 weeks ago
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated 2 years ago
eternal8080 / MV-MATH
View on GitHub
Description for MV-MATH
☆15Jul 20, 2025Updated last year
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
xcltql666 / DenseDiT
View on GitHub
Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"
☆27Jun 7, 2026Updated last month
MJ-Bench / MJ-Bench
View on GitHub
(NeurIPS 2025) Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
☆51Jun 3, 2025Updated last year
Kun-Xiang / AtomThink
View on GitHub
[TPAMI 2026] Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"
☆66Nov 18, 2025Updated 8 months ago
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆493Jul 13, 2025Updated last year
zai-org / CogCoM
View on GitHub
☆222Jul 5, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MuyeHuang / EvoChart
View on GitHub
☆19Nov 3, 2025Updated 8 months ago
OSU-NLP-Group / Middleware
View on GitHub
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Dec 29, 2024Updated last year
RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆110May 27, 2025Updated last year
open-nlplab / fastchatgpt
View on GitHub
A python tool help to interact with chatgpt.
☆10Dec 11, 2022Updated 3 years ago
chang-github-00 / Predictive-Decoding
View on GitHub
Repo for Anonymous purpose, pls don't distribute
☆10Oct 2, 2024Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
yxzwang / FamilyTool
View on GitHub
FamilyTool benchmark
☆14Sep 10, 2025Updated 10 months ago
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆143Mar 1, 2026Updated 4 months ago
njucckevin / KnowCap
View on GitHub
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
☆13Feb 15, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hqhQAQ / Hint-GRPO
View on GitHub
[ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
☆48Jul 1, 2025Updated last year
hewei2001 / ReachQA
View on GitHub
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆61Aug 25, 2025Updated 11 months ago
yayayacc / MUR
View on GitHub
☆49May 14, 2026Updated 2 months ago
zihuixue / seeAoT
View on GitHub
Code and data release for the paper "Seeing the Arrow of Time in Large Multimodal Models"
☆16Oct 2, 2025Updated 9 months ago
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,594Feb 8, 2025Updated last year
RifleZhang / LLaVA-Reasoner-DPO
View on GitHub
☆116Jan 8, 2025Updated last year
chengyou-jia / T2IS
View on GitHub
Official Repo for "Why Settle for One? Text-to-ImageSet Generation and Evaluation"
☆21Oct 1, 2025Updated 9 months ago