[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆41Nov 15, 2024Updated last year
Alternatives and similar repositories for IVM
Users that are interested in IVM are comparing it to the libraries listed below
Sorting:
- Server Usage Documentation of AIR☆22Feb 22, 2023Updated 3 years ago
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆13Jan 16, 2025Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- [ICRA'25] H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps☆12Apr 10, 2025Updated 10 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- On Path to Multimodal Generalist: General-Level and General-Bench☆18Jul 11, 2025Updated 7 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last week
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆42Jul 9, 2025Updated 7 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 7 months ago
- ☆16Jul 23, 2024Updated last year
- [AAMAS'26] xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing☆24Jan 8, 2026Updated last month
- This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)☆26Sep 6, 2025Updated 5 months ago
- [ICML 2024] The offical Implementation of "DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning"☆82May 26, 2025Updated 9 months ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Dec 5, 2024Updated last year
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Dec 21, 2023Updated 2 years ago
- This is the official implementation of "GvSeg: General and Task-Oriented Video Segmentation" (Accepted at ECCV 2024).☆18Jul 15, 2024Updated last year
- [ACM MM 2025] MLLMs for Aesthetics Reasoning☆23Jan 5, 2026Updated last month
- ☆43May 6, 2024Updated last year
- ☆30Jan 18, 2026Updated last month
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 4 months ago
- [ICML 2025] The Official Implementation of "Efficient Robotic Policy Learning via Latent Space Backward Planning"☆30Dec 15, 2025Updated 2 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- [CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering☆25Apr 24, 2025Updated 10 months ago
- ☆28Apr 8, 2025Updated 10 months ago
- Code release for "RoboPrompt"☆27Sep 30, 2025Updated 5 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆59Jan 5, 2026Updated last month
- Official implementation of "URECA : Unique Region Caption Anything"☆57Jul 13, 2025Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 7 months ago
- [ICLR 2024] The official implementation of "Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model"☆119Feb 11, 2025Updated last year
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆55Feb 1, 2026Updated last month
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 10 months ago