locuslab / llava-token-compression
☆25Updated last week
Related projects ⓘ
Alternatives and complementary repositories for llava-token-compression
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆17Updated 4 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated last month
- ☆29Updated this week
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago
- Code for T-MARS data filtering☆35Updated last year
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆19Updated last month
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆19Updated last year
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆18Updated 3 weeks ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆28Updated last week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆16Updated last month
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆43Updated 3 weeks ago
- ☆36Updated 9 months ago
- ☆24Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆30Updated 4 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆23Updated 9 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆37Updated last year
- Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"☆28Updated this week
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆31Updated 2 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆22Updated last week
- ☆12Updated last month
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆53Updated 2 months ago
- ☆19Updated last month
- Training code for CLIP-FlanT5☆19Updated 3 months ago
- ☆22Updated last year
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago