Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€
β304Feb 21, 2025Updated last year
Alternatives and similar repositories for siglip
Users that are interested in siglip are comparing it to the libraries listed below
Sorting:
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β3,380May 19, 2025Updated 10 months ago
- Hierarchical neural implicit inference over event ensembles. Code repository associated with https://arxiv.org/abs/2306.12584.β13Jun 24, 2023Updated 2 years ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,897Jan 9, 2026Updated 2 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"β16Nov 11, 2024Updated last year
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolutionβ330Jul 4, 2025Updated 8 months ago
- An open source implementation of CLIP.β13,528Mar 12, 2026Updated last week
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning withβ¦β36Jan 31, 2026Updated last month
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β892Aug 13, 2024Updated last year
- β4,607Sep 14, 2025Updated 6 months ago
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIβ¦β94Apr 29, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β21Jan 11, 2026Updated 2 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Jan 27, 2025Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.β643Feb 1, 2026Updated last month
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.β17Feb 5, 2024Updated 2 years ago
- β10Oct 22, 2024Updated last year
- β20Feb 24, 2025Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,824Nov 27, 2025Updated 3 months ago
- OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3β468Feb 21, 2026Updated last month
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VLβ2,661Mar 9, 2026Updated last week
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025β1,461Oct 9, 2025Updated 5 months ago
- Code for Fooling Contrastive Language-Image Pre-trainined Models with CLIPMasterPrintsβ15Jan 25, 2026Updated last month
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β53Oct 19, 2024Updated last year
- β18Nov 11, 2022Updated 3 years ago
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted fileβ16Jul 26, 2024Updated last year
- A forest of autonomous agents.β20Jan 27, 2025Updated last year
- Nearest Neighbor Normalization (EMNLP 2024)β20Nov 1, 2024Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ87Oct 26, 2025Updated 4 months ago
- πThe official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"β21Dec 2, 2025Updated 3 months ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,652Aug 1, 2024Updated last year
- Everything about the SmolLM and SmolVLM family of modelsβ3,675Jan 13, 2026Updated 2 months ago
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignmentsβ13Jul 8, 2024Updated last year
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Cβ¦β282Jan 16, 2025Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksβ16Nov 11, 2024Updated last year
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorchβ14Updated this week
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"β9,867Aug 12, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"β33Mar 26, 2025Updated 11 months ago
- Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization prβ¦β11Sep 15, 2023Updated 2 years ago
- A spoken version of the textual story cloze benchmarkβ20Aug 6, 2023Updated 2 years ago
- β17Jan 2, 2024Updated 2 years ago