kyegomez / BRAVE-ViT-SwarmLinks

Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"

☆25

Alternatives and similar repositories for BRAVE-ViT-Swarm

Users that are interested in BRAVE-ViT-Swarm are comparing it to the libraries listed below

Sorting:

alhojel / visual_task_vectors
☆40Updated last year
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated 2 weeks ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆36Updated 6 months ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆126Updated last year
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆78Updated 5 months ago
sehyunkwon / ICTC
This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)
☆91Updated last year
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 6 months ago
apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024
☆106Updated last year
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆125Updated last year
iancovert / locality-alignment
☆53Updated 9 months ago
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆186Updated 2 years ago
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79Updated last year
isekai-portal / Link-Context-Learning
☆99Updated last year
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated last year
Gahyeonkim09 / AAPL
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)
☆34Updated last year
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
nickjiang2378 / test-time-registers
[NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"
☆131Updated last month
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
Picsart-AI-Research / OpenBias
[CVPR 2024 Highlight] OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
☆24Updated 8 months ago
EPFL-VILAB / fm-vision-evals
☆72Updated 3 months ago
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆180Updated 6 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 4 months ago
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆159Updated last month
lambert-x / ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆55Updated 2 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆27Updated last year
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆103Updated last year
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆116Updated 3 months ago
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Updated 10 months ago
hananshafi / llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
☆82Updated last year