naver-ai / egtr
[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation
☆55Updated 2 months ago
Related projects: ⓘ
- Code for paper 'Leveraging Predicate and Triplet Learning for Scene Graph Generation'. (CVPR 2024)☆22Updated 2 weeks ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆35Updated last month
- Large-Vocabulary Video Instance Segmentation dataset☆73Updated 2 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆70Updated 2 months ago
- [ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation☆22Updated last week
- ☆19Updated last year
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆72Updated 9 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆49Updated last month
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?"☆32Updated 2 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- The official code for Devil's on the Edges: Selective Quad Attention for Scene Graph Generation, CVPR2023.☆22Updated last year
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆20Updated 4 months ago
- ☆32Updated 5 months ago
- Scene Graph Generate Zero Shot☆17Updated last year
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆13Updated 3 months ago
- Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)☆41Updated 3 months ago
- ☆27Updated this week
- Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models' (ICML2024)☆18Updated 2 weeks ago
- [ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.☆51Updated last month
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆35Updated last month
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models☆58Updated last week
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model (CVPR 2023)☆80Updated last year
- [BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition with Transformers"☆25Updated 2 years ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated last week
- Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)☆21Updated 11 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆34Updated 2 months ago
- [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…☆71Updated 3 months ago
- ☆26Updated 2 months ago