UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆89Jun 12, 2023Updated 2 years ago
Alternatives and similar repositories for UniTAB
Users that are interested in UniTAB are comparing it to the libraries listed below
Sorting:
- Exploiting unlabeled data with vision and language models for object detection, ECCV 2022☆94Jan 16, 2024Updated 2 years ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆209Dec 18, 2022Updated 3 years ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆85Nov 2, 2022Updated 3 years ago
- METER: A Multimodal End-to-end TransformER Framework☆376Nov 16, 2022Updated 3 years ago
- [CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding☆153Jul 13, 2024Updated last year
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆72May 22, 2023Updated 2 years ago
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆100Oct 14, 2022Updated 3 years ago
- SeqTR: A Simple yet Universal Network for Visual Grounding☆144Oct 30, 2024Updated last year
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)☆340Jan 8, 2024Updated 2 years ago
- Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone☆131Oct 10, 2023Updated 2 years ago
- A new video text spotting framework with Transformer☆78May 23, 2022Updated 3 years ago
- Official Codes for "Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality"☆245Dec 3, 2022Updated 3 years ago
- ☆35Oct 21, 2023Updated 2 years ago
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆421Oct 28, 2022Updated 3 years ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆49Nov 10, 2022Updated 3 years ago
- Pre-trained V+L Data Preparation☆46Jun 2, 2020Updated 5 years ago
- ☆1,048Oct 3, 2022Updated 3 years ago
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆33Apr 18, 2022Updated 3 years ago
- This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO☆68Mar 10, 2023Updated 2 years ago
- ☆16Sep 25, 2025Updated 5 months ago
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"☆807Mar 20, 2024Updated last year
- ☆105Jul 7, 2023Updated 2 years ago
- [CVPR2022] Official Implementation of ReferFormer☆352Feb 15, 2025Updated last year
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆193Sep 24, 2023Updated 2 years ago
- code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022☆268Oct 2, 2024Updated last year
- ☆41Sep 21, 2023Updated 2 years ago
- [CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》☆62May 25, 2022Updated 3 years ago
- A new framework for open-vocabulary object detection, based on maskrcnn-benchmark☆248Feb 11, 2023Updated 3 years ago
- Grounded Language-Image Pre-training☆2,572Jan 24, 2024Updated 2 years ago
- Code of ICCV paper: https://arxiv.org/abs/2011.10881☆79Nov 20, 2022Updated 3 years ago
- Improving One-stage Visual Grounding by Recursive Sub-query Construction, ECCV 2020☆89Sep 30, 2021Updated 4 years ago
- Code release for SLIP Self-supervision meets Language-Image Pre-training☆787Feb 9, 2023Updated 3 years ago
- It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.☆28Jul 6, 2022Updated 3 years ago
- (CVPR2023)Dense Distinct Query for End-to-End Object Detection☆264May 24, 2023Updated 2 years ago
- ☆32Mar 7, 2022Updated 3 years ago
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆723Aug 8, 2023Updated 2 years ago
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆15Sep 4, 2022Updated 3 years ago
- SOIT: Segmenting Objects with Instance-Aware Transformers☆14Jun 6, 2022Updated 3 years ago
- ☆87Mar 4, 2024Updated last year