dfan / websslLinks

Code for Scaling Language-Free Visual Representation Learning (WebSSL)

☆246

Alternatives and similar repositories for webssl

Users that are interested in webssl are comparing it to the libraries listed below

Sorting:

facebookresearch / webssl
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆168Updated 3 months ago
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆211Updated 4 months ago
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆89Updated last month
kyegomez / Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
☆184Updated last week
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆342Updated last week
wysoczanska / clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
☆251Updated 9 months ago
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆150Updated 7 months ago
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆187Updated last year
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆101Updated 3 months ago
baaivision / DIVA
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆283Updated 6 months ago
Haiyang-W / GiT
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
☆350Updated 6 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆404Updated 5 months ago
cilinyan / VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆182Updated 11 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆134Updated 2 months ago
ggjy / DeLVM
☆118Updated last year
tsb0601 / MMVP
☆344Updated last year
yossigandelsman / clip_text_span
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
☆217Updated 2 months ago
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆161Updated 9 months ago
shashankvkt / DoRA_ICLR24
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …
☆90Updated last year
ProvenceStar / PartGLEE
[ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
☆51Updated 10 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆75Updated 3 months ago
google-research / syn-rep-learn
Learning from synthetic data - code and models
☆319Updated last year
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆127Updated 3 months ago
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆78Updated 9 months ago
haochenheheda / LVVIS
Large-Vocabulary Video Instance Segmentation dataset
☆90Updated last year
facebookresearch / metamorph
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
☆199Updated 3 months ago
xk-huang / segment-caption-anything
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆227Updated 10 months ago
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆327Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
Official implementation of the Law of Vision Representation in MLLMs
☆163Updated 8 months ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆162Updated 2 months ago