MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated last year
Alternatives and similar repositories for MM-Instruct
Users that are interested in MM-Instruct are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for ACL 2022 paper "HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization".☆13May 24, 2022Updated 3 years ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Feb 12, 2025Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆85Jan 27, 2025Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆125Jul 27, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆20Jul 21, 2025Updated 10 months ago
- ☆13Jul 10, 2024Updated last year
- CS194-196 Course Project☆14Feb 20, 2025Updated last year
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 7 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆37May 9, 2026Updated last week
- ☆44Jul 9, 2025Updated 10 months ago
- ☆14Apr 25, 2025Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆151Aug 21, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- ☆12Jun 19, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆309May 21, 2025Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆50Jan 8, 2025Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆160Jul 28, 2025Updated 9 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- Modality Gap Theory☆71Updated this week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆41Jun 14, 2025Updated 11 months ago
- ☆24May 23, 2025Updated 11 months ago
- KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding☆68Apr 5, 2026Updated last month
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆421May 5, 2025Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆18Nov 4, 2025Updated 6 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆156Apr 30, 2024Updated 2 years ago
- A repository used to organize content related to Large Speech(Audio) Model, including paper, data, applications, tools and so on.☆28Nov 8, 2025Updated 6 months ago
- ☆27Jan 29, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated last year
- A Survey of Neural Dialogue Systems☆19Dec 31, 2021Updated 4 years ago
- [ACL'26 Findings] Official code for "BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search"☆29Apr 23, 2026Updated 3 weeks ago
- [3DV 2025] Learning Naturally Aggregated Appearance for Efficient 3D Editing☆33Feb 13, 2025Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆35Jun 12, 2025Updated 11 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆43Oct 28, 2025Updated 6 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆44Jul 2, 2025Updated 10 months ago