facebookresearch / EgoObjects
[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
☆75Updated 11 months ago
Related projects: ⓘ
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆59Updated 9 months ago
- ☆57Updated last year
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆48Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆30Updated 10 months ago
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆50Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 5 months ago
- Code release for "Training a Large Video Model on a Single Machine in a Day"☆107Updated last month
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆63Updated 2 years ago
- official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"☆19Updated 3 months ago
- Efficient Multi-modal Models via Stage-wise Visual Context Compression☆34Updated last month
- This repo contains the code for the recipe of the winning entry to the Ego4d VQ2D challenge at CVPR 2022.☆39Updated last year
- Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024☆39Updated 6 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆62Updated 5 months ago
- ☆93Updated 3 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- ☆56Updated last year
- A visual LLM for image region description or QA.☆14Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆88Updated 2 months ago
- ☆77Updated 2 years ago
- [BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization☆74Updated 3 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆49Updated last month
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- Pytorch implementation of "TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut"☆56Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- ☆52Updated 2 months ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆79Updated last week
- ☆31Updated 3 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago