mikkoim / dinotoolLinks
Command-line tool for extracting DINOv3, CLIP, SigLIP2, RADIO, features for images and videos
☆34Updated last week
Alternatives and similar repositories for dinotool
Users that are interested in dinotool are comparing it to the libraries listed below
Sorting:
- ☆104Updated 6 months ago
- This is the official code release for [LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors](https://arxiv…☆41Updated 11 months ago
- Cache PyTorch module outputs on-the-fly☆44Updated 5 months ago
- ☆77Updated last week
- [NeurIPS 2023] HASSOD: Hierarchical Adaptive Self-Supervised Object Detection☆58Updated last year
- ☆31Updated last week
- Estimate dataset difficulty and detect label mistakes using reconstruction error ratios!☆26Updated 9 months ago
- ☆71Updated 2 months ago
- ☆44Updated 8 months ago
- [NeurIPS 2025 Spotlight] "SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation."☆117Updated 2 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆37Updated last year
- Induce brain-like topographic structure in your neural networks☆69Updated 2 months ago
- EdgeSAM model for use with Autodistill.☆29Updated last year
- Timm model explorer☆42Updated last year
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆105Updated last year
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆132Updated last year
- ☆26Updated 11 months ago
- A Platform for Visual Learning from Human Feedback☆86Updated last year
- [ICCV25] Official Implementation of LeGrad☆80Updated 11 months ago
- ☆59Updated last year
- [CVPR 2025 Highlight] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"☆327Updated 2 weeks ago
- PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]☆180Updated 5 months ago
- [CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).☆416Updated 2 weeks ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated last year
- Official code of Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning☆246Updated 2 weeks ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated last year
- Scaling Vision Pre-Training to 4K Resolution☆205Updated last month
- ☆30Updated last year
- Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, …☆69Updated 5 months ago
- ☆19Updated last year