encord-team / text-to-image-eval
Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
☆43Updated 2 weeks ago
Alternatives and similar repositories for text-to-image-eval:
Users that are interested in text-to-image-eval are comparing it to the libraries listed below
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆238Updated last week
- ☆58Updated 10 months ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆56Updated this week
- auto_labeler - An all-in-one library to automatically label vision data☆12Updated last week
- Estimate dataset difficulty and detect label mistakes using reconstruction error ratios!☆18Updated 2 weeks ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated last month
- ☆64Updated 2 weeks ago
- Run zero-shot prediction models on your data☆30Updated last month
- A tool for converting computer vision label formats.☆58Updated 2 weeks ago
- Quick exploration into fine tuning florence 2☆292Updated 4 months ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆32Updated 3 weeks ago
- Notebooks for fine tuning pali gemma☆90Updated last month
- Easily get basic insights about your ML dataset☆35Updated last year
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆75Updated 2 years ago
- From scratch implementation of a vision language model in pure PyTorch☆192Updated 8 months ago
- FiftyOne Plugin for finding common image quality issues☆31Updated 3 months ago
- ☆198Updated last year
- Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, …☆67Updated last year
- Perform visual question answering on your images☆16Updated 8 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆59Updated 4 months ago
- Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm…☆130Updated 2 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- Computer Vision dataset analysis☆292Updated 5 months ago
- Data release for the ImageInWords (IIW) paper.☆206Updated 2 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆61Updated 5 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆196Updated 7 months ago
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆17Updated 5 months ago
- An SDK for Transformers + YOLO and other SSD family models☆58Updated this week
- My journey during 10 weeks of building FiftyOne plugins☆21Updated last year
- ☆486Updated 2 months ago