[CBMI 2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".
☆31May 12, 2025Updated last year
Alternatives and similar repositories for FG-CLIP
Users that are interested in FG-CLIP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detec…☆67Apr 4, 2025Updated last year
- [WACV 2026] Official implementation of the paper: “CountingDINO: A Training-free Pipeline for Exemplar-based Class-Agnostic Counting”☆62Jun 22, 2026Updated last week
- [ICML2024] Official PyTorch implementation of CoMC: Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition☆17Jul 9, 2024Updated last year
- [ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset☆87Aug 6, 2025Updated 10 months ago
- [ICCV 2025] Official repository of the paper "Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabular…☆190Nov 10, 2025Updated 7 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression☆47Aug 7, 2025Updated 10 months ago
- Resources for our AAAI 2022 paper: "Unsupervised Editing for Counterfactual Stories".☆12Oct 25, 2022Updated 3 years ago
- Repository for evaluating Pegasus-1 and video-language foundation models☆14Nov 12, 2024Updated last year
- This repository contains the code for our CVPR 2024 paper,☆15Aug 27, 2024Updated last year
- A large scale dataset for Video Captioning in Italian☆13May 16, 2023Updated 3 years ago
- Pytorch implementation for DA-VPT (CVPR2025)☆19Dec 15, 2025Updated 6 months ago
- A vision-language model with an improved cross-attention mechanism for scalable streaming inference☆30Mar 9, 2026Updated 3 months ago
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Mar 2, 2024Updated 2 years ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆18Apr 2, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model☆15Jul 31, 2025Updated 11 months ago
- Official repository for ODQA experiments from Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR23☆13Jul 28, 2023Updated 2 years ago
- Learning to Count without Annotations☆23May 24, 2024Updated 2 years ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆207Feb 5, 2024Updated 2 years ago
- Official implementation of the paper “Endowing Vision-Language Models with System 2 Thinking for Fine-Grained Visual Recognition,” AAAI 2…☆42Jan 30, 2026Updated 5 months ago
- Multimodal RAG using LlamaIndex, Qdrant, llama.cpp for document QA with local VisonLLM and embedding models☆20Nov 8, 2024Updated last year
- ☆17Oct 22, 2024Updated last year
- A simple Computer Vision Framework, mainly based on PyTorch. Including distributed training, logging and so on.☆12Dec 2, 2023Updated 2 years ago
- ☆16Sep 6, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [MICCAI 2023] (early accept) UOD: universal oneshot detection of anatomical landmarks. https://arxiv.org/abs/2306.07615☆12Jan 4, 2024Updated 2 years ago
- This is the official repository for our paper "Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning" pu…☆47Apr 11, 2026Updated 2 months ago
- Composed Video Retrieval☆62May 2, 2024Updated 2 years ago
- [ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion☆67Nov 30, 2025Updated 7 months ago
- [CVPR2025] Official implementation of RAM☆29Nov 4, 2025Updated 8 months ago
- Multiresolution Learning-based Hybrid Transformer-CNN Model for Anatomical Landmark Detection☆13Nov 5, 2023Updated 2 years ago
- A PyTorch implementation of NormSoftmax based on BMVC 2019 paper "Classification is a Strong Baseline for Deep Metric Learning"☆10Mar 15, 2020Updated 6 years ago
- [ECCV2024]FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance☆18Sep 11, 2024Updated last year
- Repo of NeurIPS23☆17Oct 25, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆18Jun 14, 2024Updated 2 years ago
- Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement☆17Nov 11, 2024Updated last year
- pytorch implementation of ECCV2022 "One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark Refinement"☆10May 2, 2023Updated 3 years ago
- ☆25Nov 25, 2025Updated 7 months ago
- ☆20Nov 4, 2023Updated 2 years ago
- showing how to use CLIP-Vip to do video search☆16Nov 16, 2023Updated 2 years ago
- Fluent student-teacher redteaming☆23Jul 25, 2024Updated last year