kdexd / coco-remLinks
Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."
☆32Updated last year
Alternatives and similar repositories for coco-rem
Users that are interested in coco-rem are comparing it to the libraries listed below
Sorting:
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆77Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆129Updated last year
- Adobe-EntitySeg dataset☆43Updated 2 years ago
- Test-Time Training on Video Streams☆66Updated 2 years ago
- PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]☆181Updated 8 months ago
- ☆26Updated last year
- A curated list of papers and resources for text-to-image evaluation.☆30Updated 2 years ago
- Unifying Specialized Visual Encoders for Video Language Models☆24Updated last month
- (ICLR 2024, CVPR 2024) SparseFormer☆75Updated last year
- ☆120Updated last year
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆128Updated last year
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆101Updated 9 months ago
- 1-shot image segmentation using Stable Diffusion☆142Updated last year
- ☆58Updated 2 years ago
- Diffusion Models as Data Mining Tools☆56Updated 7 months ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆37Updated 6 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 5 months ago
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated this week
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆99Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- Code release for "Language-conditioned Detection Transformer"☆88Updated last year
- ☆19Updated 2 years ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆14Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- Image Tokenizer Needs Post-Training☆24Updated 3 months ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆37Updated last year
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆134Updated 8 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆46Updated last year
- [CVPRW'23] The official PyTorch implementation of NamedMask☆23Updated 2 years ago