kdexd / coco-rem
Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."
☆27Updated 9 months ago
Alternatives and similar repositories for coco-rem
Users that are interested in coco-rem are comparing it to the libraries listed below
Sorting:
- ☆23Updated 6 months ago
- MIMIC: Masked Image Modeling with Image Correspondences☆16Updated 10 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆74Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆124Updated 8 months ago
- ☆61Updated last year
- A curated list of papers and resources for text-to-image evaluation.☆29Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆76Updated last year
- Diffusion Models as Data Mining Tools☆54Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 10 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- ☆19Updated last year
- [CVPR 2025] Test-Time Visual In-Context Tuning☆23Updated last month
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆35Updated 2 years ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 3 months ago
- [ECCV 2022] Is Appearance Free Action Recognition Possible?☆58Updated last year
- Adobe-EntitySeg dataset☆41Updated last year
- ☆39Updated last year
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated last year
- Code release for "Language-conditioned Detection Transformer"☆88Updated 10 months ago
- ☆28Updated 4 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆29Updated 5 months ago
- ☆58Updated last year
- Code for paper Background Prompting for Improved Object Depth☆29Updated last year
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆99Updated last month
- Unifying Specialized Visual Encoders for Video Language Models☆18Updated this week
- Code for Learning to Zoom and Unzoom (CVPR 2023)☆47Updated last year
- ☆52Updated 2 years ago
- Official repository of paper "Subobject-level Image Tokenization"☆70Updated last month
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago