kyegomez / BRAVE-ViT-Swarm
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"
☆21Updated this week
Related projects ⓘ
Alternatives and complementary repositories for BRAVE-ViT-Swarm
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆36Updated last month
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- ☆22Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆34Updated this week
- Multimodal Video Understanding Framework (MVU)☆23Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 3 months ago
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆24Updated 8 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆96Updated 2 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- ☆38Updated 3 months ago
- ☆30Updated this week
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆52Updated last month
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆49Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆38Updated 7 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- ☆33Updated 4 months ago
- Holistic evaluation of multimodal foundation models☆41Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆59Updated 5 months ago
- ☆16Updated last month
- ☆35Updated 3 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆17Updated 3 weeks ago
- ☆16Updated 3 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆161Updated last month
- A simple reproducible template to implement AI research papers☆23Updated 2 months ago
- Matryoshka Multimodal Models☆82Updated this week
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆109Updated 7 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated last week
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆35Updated 2 months ago