jihaonew/MM-Instruct

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jihaonew/MM-Instruct)

jihaonew / MM-Instruct

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

☆35

Alternatives and similar repositories for MM-Instruct

Users that are interested in MM-Instruct are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
ShuyangCao / hibrids_summ
View on GitHub
Code for ACL 2022 paper "HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization".
☆13May 24, 2022Updated 4 years ago
mm-vl / ULM-R1
View on GitHub
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆48Jul 22, 2025Updated last year
e-bug / fine-grained-evals
View on GitHub
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
☆13Jun 11, 2023Updated 3 years ago
mahtabbigverdi / Aurora
View on GitHub
☆12Dec 4, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RenShuhuai-Andy / NBP
View on GitHub
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆42Feb 12, 2025Updated last year
zwq2018 / Multi-modal-Self-instruct
View on GitHub
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆85Jan 27, 2025Updated last year
orrzohar / Video-STaR
View on GitHub
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆72Jul 10, 2024Updated 2 years ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
Zoeyyao27 / SirLLM
View on GitHub
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆60May 28, 2024Updated 2 years ago
yuangpeng / dreambench_plus
View on GitHub
[ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
☆137Feb 23, 2025Updated last year
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
ALT-JS / OthelloSAE
View on GitHub
CS194-196 Course Project
☆14Feb 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lancercat / OSOCR
View on GitHub
☆10Nov 21, 2023Updated 2 years ago
Fay-Y / Diffusion-RSCC
View on GitHub
☆22Mar 23, 2025Updated last year
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
lzw-lzw / UnifiedMLLM
View on GitHub
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Aug 5, 2024Updated last year
alexmartin1722 / wikivideo
View on GitHub
WikiVideo: Article Generation from Multiple Videos
☆15Nov 14, 2025Updated 8 months ago
Elvin-Yiming-Du / Memory-T1
View on GitHub
This respository is used for time reasoning task for mult-session dialogue system.
☆16Feb 7, 2026Updated 5 months ago
lambert-x / ProLab
View on GitHub
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆55Aug 27, 2025Updated 10 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zhishuifeiqian / VCR-Bench
View on GitHub
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆37May 9, 2026Updated 2 months ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
qiuk2 / RobusTok
View on GitHub
Image Tokenizer Needs Post-Training
☆24Oct 4, 2025Updated 9 months ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
mbzuai-oryx / LlamaV-o1
View on GitHub
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307May 21, 2025Updated last year
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated last year
Open-Reasoner-Zero / Open-Vision-Reasoner
View on GitHub
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆157Sep 12, 2025Updated 10 months ago
franciszzj / OpenPSG
View on GitHub
[ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
☆50Jan 8, 2025Updated last year
ConiferLM / Conifer
View on GitHub
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
☆91Apr 4, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Ucas-HaoranWei / Slow-Perception
View on GitHub
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆161Jul 28, 2025Updated 11 months ago
BZX667 / DMPO
View on GitHub
☆12Jun 19, 2024Updated 2 years ago
LuLuLuyi / TDAR
View on GitHub
Advancing Block Diffusion Language Models for Test-Time Scaling
☆16Feb 14, 2026Updated 5 months ago
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated last month
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
HaroldChen19 / VistaDPO
View on GitHub
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆41Jun 14, 2025Updated last year
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year