liunian-harold-li / DesCoLinks

☆30

Alternatives and similar repositories for DesCo

Users that are interested in DesCo are comparing it to the libraries listed below

Sorting:

allenai / reclip
☆86Updated 3 years ago
sail-sg / ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆152Updated 2 years ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆87Updated 2 years ago
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆36Updated last year
Shengcao-Cao / groundLMM
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 4 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
LeapLabTHU / Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
☆150Updated last year
seanzhuh / SeqTR
SeqTR: A Simple yet Universal Network for Visual Grounding
☆140Updated 9 months ago
luogen1996 / SimREC
A lightweight codebase for referring expression comprehension and segmentation
☆55Updated 3 years ago
JacobYuan7 / RLIP
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…
☆77Updated last year
thunlp / PEVL
Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”
☆48Updated 2 years ago
palchenli / VL-Instruction-Tuning
☆91Updated last year
shikras / d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆128Updated last year
salesforce / PB-OVD
A pytorch Implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
☆61Updated 2 years ago
vishaal27 / SuS-X
Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]
☆103Updated last year
ubc-vision / RefTR
Official Implementation for paper "Referring Transformer: A One-step Approach to Multi-task Visual Grounding" Neurips 2021
☆68Updated 3 years ago
kingthreestones / RefCLIP
☆36Updated 2 years ago
Cuberick-Orion / CIRR
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
☆117Updated 2 months ago
SivanDoveh / TSVLC
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆46Updated last year
Monoxide-Chen / uncertainty_retrieval
ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
☆73Updated last year
pals-ttic / adapting-CLIP
☆64Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
Yuqifan1117 / CaCao
This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…
☆47Updated last year
YYJMJC / LOUPE
☆45Updated last year
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆57Updated 2 years ago
xuanlinli17 / large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
☆58Updated last year
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆31Updated 2 years ago
vinid / neg_clip
NegCLIP.
☆34Updated 2 years ago
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆78Updated 9 months ago
mzhaoshuai / CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…
☆132Updated 3 years ago