UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆90Jun 12, 2023Updated 2 years ago
Alternatives and similar repositories for UniTAB
Users that are interested in UniTAB are comparing it to the libraries listed below
Sorting:
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆100Oct 14, 2022Updated 3 years ago
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆72May 22, 2023Updated 2 years ago
- Exploiting unlabeled data with vision and language models for object detection, ECCV 2022☆94Jan 16, 2024Updated 2 years ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆210Dec 18, 2022Updated 3 years ago
- Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone☆130Oct 10, 2023Updated 2 years ago
- METER: A Multimodal End-to-end TransformER Framework☆377Nov 16, 2022Updated 3 years ago
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)☆341Jan 8, 2024Updated 2 years ago
- SeqTR: A Simple yet Universal Network for Visual Grounding☆144Oct 30, 2024Updated last year
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆84Nov 2, 2022Updated 3 years ago
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆15Sep 4, 2022Updated 3 years ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆49Nov 10, 2022Updated 3 years ago
- [CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding☆153Jul 13, 2024Updated last year
- ☆16Sep 25, 2025Updated 5 months ago
- A new video text spotting framework with Transformer☆80May 23, 2022Updated 3 years ago
- ☆1,047Oct 3, 2022Updated 3 years ago
- ☆85Dec 4, 2022Updated 3 years ago
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆34Apr 18, 2022Updated 3 years ago
- ☆35Oct 21, 2023Updated 2 years ago
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆194Sep 24, 2023Updated 2 years ago
- Grounded Language-Image Pre-training☆2,580Jan 24, 2024Updated 2 years ago
- Improving One-stage Visual Grounding by Recursive Sub-query Construction, ECCV 2020☆90Sep 30, 2021Updated 4 years ago
- Pre-trained V+L Data Preparation☆46Jun 2, 2020Updated 5 years ago
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"☆808Mar 20, 2024Updated 2 years ago
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆420Oct 28, 2022Updated 3 years ago
- ☆32Mar 7, 2022Updated 4 years ago
- code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022☆268Oct 2, 2024Updated last year
- Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))☆56Feb 6, 2023Updated 3 years ago
- [ECCV 2022] AMixer: Adaptive Weight Mixing for Self-attention Free Vision Transformers☆29Nov 14, 2022Updated 3 years ago
- [CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》☆62May 25, 2022Updated 3 years ago
- Code for IterInpaint model, presented in Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (CVPR 2024 work…☆25Jul 21, 2024Updated last year
- ☆231Dec 18, 2023Updated 2 years ago
- Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)☆939Nov 7, 2023Updated 2 years ago
- ☆41Sep 21, 2023Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Language☆579Dec 2, 2023Updated 2 years ago
- Omnivore: A Single Model for Many Visual Modalities☆572Nov 12, 2022Updated 3 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆76Jan 27, 2024Updated 2 years ago
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Jul 14, 2021Updated 4 years ago
- It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.☆28Jul 6, 2022Updated 3 years ago
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)☆374Jul 29, 2023Updated 2 years ago