amazon-science / QA-ViT
☆45Updated 2 months ago
Related projects: ⓘ
- Official implementation of the Law of Vision Representation in MLLMs☆93Updated last week
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆100Updated 2 months ago
- Matryoshka Multimodal Models☆67Updated 3 weeks ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆39Updated last week
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆61Updated 4 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- Dense Connector for MLLMs☆98Updated last month
- ☆80Updated 4 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆43Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆39Updated 3 weeks ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 4 months ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆35Updated last month
- Language Repository for Long Video Understanding☆27Updated 3 months ago
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- ☆40Updated 4 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆116Updated last week
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆81Updated 5 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆44Updated 2 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆27Updated 3 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆85Updated last week
- ☆100Updated last month
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- Object Recognition as Next Token Prediction (CVPR 2024)☆153Updated last month
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆37Updated 2 months ago
- ☆46Updated 10 months ago