Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€
β307Feb 21, 2025Updated last year
Alternatives and similar repositories for siglip
Users that are interested in siglip are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β3,419May 19, 2025Updated 10 months ago
- Hierarchical neural implicit inference over event ensembles. Code repository associated with https://arxiv.org/abs/2306.12584.β13Jun 24, 2023Updated 2 years ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,908Jan 9, 2026Updated 3 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"β16Nov 11, 2024Updated last year
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolutionβ330Jul 4, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An open source implementation of CLIP.β13,695Apr 6, 2026Updated last week
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning withβ¦β36Updated this week
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β897Aug 13, 2024Updated last year
- β4,638Updated this week
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIβ¦β94Apr 29, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β22Jan 11, 2026Updated 3 months ago
- LLM2CLIP significantly improves already state-of-the-art CLIP models.β646Feb 1, 2026Updated 2 months ago
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.β17Feb 5, 2024Updated 2 years ago
- This repository includes the introduction to uncertain label in Chest X-Ray diagnosis.β10Oct 20, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Jan 27, 2025Updated last year
- β21Feb 24, 2025Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,832Nov 27, 2025Updated 4 months ago
- OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3β474Feb 21, 2026Updated last month
- CLIP+MLP Aesthetic Score Predictorβ1,280Jul 1, 2024Updated last year
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VLβ2,667Apr 6, 2026Updated last week
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β17Apr 22, 2025Updated 11 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025β1,491Updated this week
- Code for Fooling Contrastive Language-Image Pre-trainined Models with CLIPMasterPrintsβ15Jan 25, 2026Updated 2 months ago
- Deploy open-source AI quickly and easily - Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β53Oct 19, 2024Updated last year
- β18Nov 11, 2022Updated 3 years ago
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted fileβ16Jul 26, 2024Updated last year
- Nearest Neighbor Normalization (EMNLP 2024)β21Nov 1, 2024Updated last year
- A forest of autonomous agents.β20Jan 27, 2025Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ87Oct 26, 2025Updated 5 months ago
- πThe official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"β22Dec 2, 2025Updated 4 months ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,664Aug 1, 2024Updated last year
- Everything about the SmolLM and SmolVLM family of modelsβ3,705Apr 2, 2026Updated 2 weeks ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Cβ¦β285Jan 16, 2025Updated last year
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignmentsβ13Jul 8, 2024Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksβ16Nov 11, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"β33Mar 26, 2025Updated last year
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"β10,010Aug 12, 2024Updated last year
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorchβ14Updated this week
- Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization prβ¦β11Sep 15, 2023Updated 2 years ago