foundation-multimodal-models / ConBench
[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
☆30Updated last month
Related projects ⓘ
Alternatives and complementary repositories for ConBench
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆56Updated last month
- A paper list of some recent works about Token Compress for Vit and VLM☆152Updated this week
- ☆109Updated 5 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆51Updated 2 months ago
- [ICCV 2023 oral] This is the official repository for our paper: ''Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning''.☆64Updated last year
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆173Updated last year
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆54Updated 2 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated 8 months ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated last year
- ☆16Updated 3 months ago
- Dataset pruning for ImageNet and LAION-2B.☆69Updated 4 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆52Updated this week
- ☆75Updated last year
- A collection of visual instruction tuning datasets.☆75Updated 8 months ago
- Official repository for CoMM Dataset☆24Updated 2 months ago
- ☆85Updated last year
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆89Updated last year
- ☆42Updated last month
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆31Updated last year
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆45Updated 3 weeks ago
- A RLHF Infrastructure for Vision-Language Models☆111Updated last week
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆36Updated last month
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.☆26Updated last year
- ☆23Updated 3 months ago
- ☆78Updated 9 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆22Updated 7 months ago
- [ICLR2024] Exploring Target Representations for Masked Autoencoders☆51Updated 10 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆44Updated 6 months ago
- [CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".☆49Updated 2 years ago