kdexd / coco-rem
Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."
☆26Updated 8 months ago
Alternatives and similar repositories for coco-rem:
Users that are interested in coco-rem are comparing it to the libraries listed below
- ☆58Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆122Updated 7 months ago
- ☆23Updated 5 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆35Updated 9 months ago
- 👆Pytorch implementation of "Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion"☆25Updated 5 months ago
- Code for paper Background Prompting for Improved Object Depth☆29Updated last year
- Adobe-EntitySeg dataset☆40Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆44Updated 2 months ago
- [ICCV 2023] Learning Fine-Grained Features for Pixel-wise Video Correspondences☆17Updated last year
- (ICLR 2024, CVPR 2024) SparseFormer☆73Updated 4 months ago
- Personalized Representation from Personalized Generation (ICLR 2025)☆54Updated 3 weeks ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 3 months ago
- MIMIC: Masked Image Modeling with Image Correspondences☆16Updated 9 months ago
- ☆105Updated 9 months ago
- Sora Generates Videos with Stunning Geometrical Consistency☆49Updated last year
- Scaling Properties of Diffusion Models For Perceptual Tasks☆38Updated 4 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆29Updated 4 months ago
- ☆27Updated 2 months ago
- ☆18Updated last year
- [ECCV 2024] Code for "EraseDraw: Learning to Insert Objects by Erasing Them from Images"☆24Updated 3 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆61Updated 3 weeks ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆25Updated last week
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing☆40Updated 3 weeks ago
- Implementation of Zero-Shot Video Semantic Segmentation [CVPR 2025]☆44Updated last month
- A curated list of papers and resources for text-to-image evaluation.☆28Updated last year
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆93Updated 8 months ago
- ☆44Updated 2 years ago