merveenoyan / siglipView external linksLinks
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€
β298Feb 21, 2025Updated 11 months ago
Alternatives and similar repositories for siglip
Users that are interested in siglip are comparing it to the libraries listed below
Sorting:
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β3,355May 19, 2025Updated 8 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning withβ¦β36Jan 31, 2026Updated 2 weeks ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,875Jan 9, 2026Updated last month
- β14Mar 28, 2024Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksβ16Nov 11, 2024Updated last year
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted fileβ16Jul 26, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.β627Feb 1, 2026Updated 2 weeks ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Jan 27, 2025Updated last year
- β10Oct 22, 2024Updated last year
- Tiktok is an advanced multimedia recommender system that fuses the generative modality-aware collaborative self-augmentation and contrastβ¦β14Aug 18, 2023Updated 2 years ago
- Train a production grade GPT in less than 400 lines of code. Better than Karpathy's verison and GIGAGPTβ16Feb 6, 2026Updated last week
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.β17Feb 5, 2024Updated 2 years ago
- quick playground to animate pippinβ14Nov 11, 2024Updated last year
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolutionβ331Jul 4, 2025Updated 7 months ago
- An open source implementation of CLIP.β13,383Updated this week
- Unit Scaling demo and experimentation codeβ16Mar 12, 2024Updated last year
- Minimal Implimentation of VCRec (2024) for collapse provention.β18Jan 28, 2025Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β889Aug 13, 2024Updated last year
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIβ¦β94Apr 29, 2024Updated last year
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025β1,426Oct 9, 2025Updated 4 months ago
- CLIP+MLP Aesthetic Score Predictorβ1,255Jul 1, 2024Updated last year
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VLβ2,659Updated this week
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,812Nov 27, 2025Updated 2 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ452Jan 29, 2026Updated 2 weeks ago
- β4,562Sep 14, 2025Updated 5 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β20Jan 11, 2026Updated last month
- Nearest Neighbor Normalization (EMNLP 2024)β19Nov 1, 2024Updated last year
- A forest of autonomous agents.β19Jan 27, 2025Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β52Oct 19, 2024Updated last year
- coded with and corrected by Google Anti-Gravityβ13Nov 23, 2025Updated 2 months ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Cβ¦β278Jan 16, 2025Updated last year
- Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"β40Jun 11, 2025Updated 8 months ago
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"β9,725Aug 12, 2024Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,648Aug 1, 2024Updated last year
- [TMLR 2026] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Modelsβ122Feb 10, 2025Updated last year
- Everything about the SmolLM and SmolVLM family of modelsβ3,621Jan 13, 2026Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ85Oct 26, 2025Updated 3 months ago
- A component that allows you to annotate an image with points and boxes.β21Dec 12, 2023Updated 2 years ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Familyβ2,537Apr 2, 2025Updated 10 months ago