Optimization-AI / fast_clip

☆25

Alternatives and similar repositories for fast_clip

Users that are interested in fast_clip are comparing it to the libraries listed below

Sorting:

wangf3014 / Patch_Scaling
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
☆19Updated 2 months ago
mshukor / eP-ALM
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Updated last year
TencentARC / pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆32Updated last year
philippe-eecs / small-vision
A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.
☆35Updated 10 months ago
HelmholtzAI-FZJ / flex_gen
☆17Updated 4 months ago
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆20Updated last year
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated last year
jeykigung / HiCLIP
☆29Updated 2 years ago
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆31Updated last year
jmerullo / limber
https://arxiv.org/abs/2209.15162
☆50Updated 2 years ago
iancovert / locality-alignment
☆45Updated 3 months ago
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated 11 months ago
csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆29Updated 7 months ago
Vinoground / Vinoground
☆10Updated 6 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated last year
wjxts / RegularizedBN
☆21Updated 2 years ago
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆15Updated 2 months ago
facebookresearch / ViP-MAE
This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision
☆36Updated last year
locuslab / llava-token-compression
☆41Updated 6 months ago
ExplainableML / fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆54Updated 5 months ago
ziplab / SN-Netv2
[ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".
☆27Updated last year
archiki / RepARe
☆19Updated last year
LaVi-Lab / Visual-Table
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆17Updated 6 months ago
WangFei-2019 / SNARE
Project for SNARE benchmark
☆11Updated 11 months ago
DeepLearnXMU / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆28Updated 7 months ago
zju-vipa / training_free_model_merging
This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).
☆29Updated last year
ggjy / vision_weak_to_strong
☆38Updated last year
mlfoundations / clip_quality_not_quantity
☆29Updated 2 years ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆52Updated last month
mlfoundations / VisIT-Bench
☆51Updated last year