Aasthaengg / GLIP-BLIP-Vision-Langauge-Obj-Det-VQA
☆31Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for GLIP-BLIP-Vision-Langauge-Obj-Det-VQA
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆34Updated last year
- ☆86Updated 10 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Updated 3 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆97Updated last year
- ☆64Updated last year
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆74Updated 2 years ago
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated last year
- 【ECCV2024】The official repo of Griffon series☆102Updated this week
- ☆57Updated 7 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago
- [NeurIPS 2023] HASSOD: Hierarchical Adaptive Self-Supervised Object Detection☆49Updated 9 months ago
- ALIGN trained on COYO-dataset☆28Updated 6 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆132Updated last year
- EdgeSAM model for use with Autodistill.☆25Updated 4 months ago
- Vision-oriented multimodal AI☆49Updated 4 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆95Updated last month
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆22Updated 10 months ago
- A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…☆107Updated 7 months ago
- Tracking through Containers and Occluders in the Wild (CVPR 2023) - Official Implementation☆39Updated 5 months ago
- A FiftyOne Plugin that allows you to search across any modality in your videos!☆15Updated 11 months ago
- Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, …☆67Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated this week
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- CVPR2023 paper☆50Updated last year
- ☆13Updated last year
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago