allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆48Updated last year
Related projects: ⓘ
- ☆31Updated 3 months ago
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆63Updated 2 years ago
- ☆63Updated 11 months ago
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images☆58Updated 2 years ago
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆47Updated 9 months ago
- ☆62Updated 2 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated last year
- ☆57Updated last year
- ☆17Updated last month
- A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization☆21Updated last month
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated 11 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- ☆29Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆130Updated last year
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆99Updated last year
- (ICLR 2024, CVPR 2024) SparseFormer☆62Updated 5 months ago
- Command-line tool for downloading and extending the RedCaps dataset.☆45Updated 9 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆92Updated last week
- Open-source code for Generic Grouping Network (GGN, CVPR 2022)☆110Updated 5 months ago
- [CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning☆64Updated last year
- Compress conventional Vision-Language Pre-training data☆49Updated 11 months ago
- ☆52Updated last year
- https://arxiv.org/abs/2209.15162☆48Updated last year
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Updated 3 years ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆22Updated 7 months ago
- Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling @ CVPR22☆42Updated last year