kirill-vish / Beyond-INetLinks

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"

☆101

Alternatives and similar repositories for Beyond-INet

Users that are interested in Beyond-INet are comparing it to the libraries listed below

Sorting:

OliverRensu / DeepMIM
[WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling
☆53Updated 2 months ago
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
facebookresearch / maws
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
☆91Updated 3 months ago
kaiyuyue / nxtp
[CVPR'24 Highlight] PyTorch Implementation of Object Recognition as Next Token Prediction
☆180Updated 3 months ago
Expedit-LargeScale-Vision-Transformer / Expedit-SAM
[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…
☆85Updated last year
facebookresearch / PartDistillation
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆58Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆100Updated 4 months ago
iancovert / locality-alignment
☆51Updated 6 months ago
facebookresearch / r-mae
PyTorch implementation of R-MAE https//arxiv.org/abs/2306.05411
☆113Updated 2 years ago
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆125Updated 11 months ago
facebookresearch / ViP-MAE
This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision
☆36Updated 2 years ago
mcahny / rovit
RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"
☆18Updated last year
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆127Updated 3 months ago
apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".
☆102Updated last year
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆103Updated last year
ziplab / SN-Netv2
[ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".
☆28Updated last year
TomerRonen34 / mixed-resolution-vit
☆51Updated last year
vlfom / RNCDL
[NeurIPS 2022] The official implementation of "Learning to Discover and Detect Objects".
☆111Updated 2 years ago
jeykigung / HiCLIP
☆29Updated 2 years ago
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆189Updated last year
TencentARC / ViSFT
☆34Updated last year
janghyuncho / DECOLA
Code release for "Language-conditioned Detection Transformer"
☆87Updated last year
OscarXZQ / weight-selection
☆182Updated 10 months ago
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆77Updated last year
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆80Updated last month
showlab / sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
☆74Updated 8 months ago
ggjy / vision_weak_to_strong
☆38Updated last year
ml-jku / semantic-image-text-alignment
☆24Updated 2 years ago
bytedance / DQ-Det
Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
☆37Updated last year
RAIVNLab / MIMIC
MIMIC: Masked Image Modeling with Image Correspondences
☆16Updated last year