encord-team / text-to-image-evalLinks
Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
☆51Updated 5 months ago
Alternatives and similar repositories for text-to-image-eval
Users that are interested in text-to-image-eval are comparing it to the libraries listed below
Sorting:
- Estimate dataset difficulty and detect label mistakes using reconstruction error ratios!☆25Updated 5 months ago
- ☆58Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆93Updated 6 months ago
- Notebooks for fine tuning pali gemma☆111Updated 2 months ago
- ☆74Updated 3 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆244Updated 5 months ago
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆76Updated 3 years ago
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆31Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆64Updated 10 months ago
- The most impactful papers related to contrastive pretraining for multimodal models!☆67Updated last year
- Easily get basic insights about your ML dataset☆38Updated last year
- A tool for converting computer vision label formats.☆62Updated 2 months ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆31Updated 8 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆102Updated last year
- auto_labeler - An all-in-one library to automatically label vision data☆15Updated 5 months ago
- Supercharge Your PyTorch Image Models: Bag of Tricks to 8x Faster Inference with ONNX Runtime & Optimizations☆23Updated 8 months ago
- Timm model explorer☆40Updated last year
- Run zero-shot prediction models on your data☆32Updated 6 months ago
- Solving Computer Vision with AI agents☆33Updated last month
- Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆136Updated 2 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆73Updated 9 months ago
- ☆75Updated 8 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated 9 months ago
- [NeurIPS 2023] HASSOD: Hierarchical Adaptive Self-Supervised Object Detection☆56Updated last year
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆119Updated last year
- An open-source implementaion for fine-tuning SmolVLM.☆40Updated last month
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated last year
- An open source implementation of CLIP (With TULIP Support)☆157Updated last month
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆44Updated 8 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆307Updated 2 weeks ago