[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to improve performance of numerous vision language performances for diverse capabilities.
☆116May 30, 2024Updated last year
Alternatives and similar repositories for Meteor
Users that are interested in Meteor are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆99Jun 23, 2024Updated last year
- [ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mO…☆99Jun 28, 2024Updated last year
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Dec 27, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆329Mar 28, 2024Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Oct 9, 2024Updated last year
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆46Jun 12, 2025Updated 8 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Oct 19, 2025Updated 4 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆86Mar 21, 2024Updated last year
- Official PyTorch Implementation Code for Developing Super Fast Adversarial Training with Distributed Data Parallel, Channel Last Memory F…☆33Mar 13, 2023Updated 2 years ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- ☆124Jul 29, 2024Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated last week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆222Oct 20, 2025Updated 4 months ago
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆13Jun 24, 2024Updated last year
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆203Jun 18, 2025Updated 8 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108May 29, 2025Updated 9 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Official PyTorch implementation Source code for Adaptive Self-Training Framework for Fine-grained Scene Graph generation (ST-SGG), accept…☆22Jan 30, 2024Updated 2 years ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Aug 21, 2024Updated last year
- ☆47Nov 8, 2024Updated last year
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 2 years ago
- [CVPR 2023] Official PyTorch Implementation for "Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust N…☆45Jul 18, 2023Updated 2 years ago
- ☆17Aug 1, 2024Updated last year
- Long Context Transfer from Language to Vision☆402Mar 18, 2025Updated 11 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Jan 30, 2024Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated 10 months ago
- [CVPR 2022] Official PyTorch Implementation for "Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network"☆32Mar 13, 2023Updated 2 years ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆162Jun 8, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆151Sep 10, 2024Updated last year