MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated last year
Alternatives and similar repositories for MM-Instruct
Users that are interested in MM-Instruct are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for ACL 2022 paper "HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization".☆13May 24, 2022Updated 3 years ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- ☆12Dec 4, 2024Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Feb 12, 2025Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆86Jan 27, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Jul 10, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆124Jul 27, 2024Updated last year
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- [ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation☆132Feb 23, 2025Updated last year
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆20Jul 21, 2025Updated 9 months ago
- ☆13Jul 10, 2024Updated last year
- CS194-196 Course Project☆14Feb 20, 2025Updated last year
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 6 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆44Jul 9, 2025Updated 9 months ago
- ☆22Mar 23, 2025Updated last year
- ☆14Apr 25, 2025Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆148Aug 21, 2025Updated 8 months ago
- ☆16Jul 23, 2024Updated last year
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆55Aug 27, 2025Updated 8 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆59Apr 1, 2026Updated last month
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆309May 21, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆50Jan 8, 2025Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆160Jul 28, 2025Updated 9 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models☆91Apr 4, 2024Updated 2 years ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆41Jun 14, 2025Updated 10 months ago
- ☆24May 23, 2025Updated 11 months ago
- KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding☆67Apr 5, 2026Updated 3 weeks ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆282Jun 25, 2024Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Developer project for getting basic API integrations working in under 5 minutes☆11Jan 30, 2026Updated 3 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆419May 5, 2025Updated 11 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆18Nov 4, 2025Updated 5 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆156Apr 30, 2024Updated 2 years ago
- ☆27Jan 29, 2025Updated last year
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated 11 months ago