allenai / visprogLinks

Official code for VisProg (CVPR 2023 Best Paper!)

☆751

Alternatives and similar repositories for visprog

Users that are interested in visprog are comparing it to the libraries listed below

Sorting:

DirtyHarryLYL / LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆872Updated 8 months ago
shikras / shikra
☆797Updated last year
Computer-Vision-in-the-Wild / CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,343Updated last year
microsoft / X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,336Updated 2 years ago
NVlabs / ODISE
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
☆930Updated last year
Jingkang50 / OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
☆462Updated 2 years ago
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,266Updated 3 years ago
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,220Updated last year
jshilong / GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆548Updated 5 months ago
facebookresearch / flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
☆427Updated 2 years ago
Vision-CAIR / ChatCaptioner
Official Repository of ChatCaptioner
☆466Updated 2 years ago
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆926Updated 3 months ago
OpenGVLab / all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆501Updated last year
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆785Updated last week
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆852Updated 3 months ago
facebookresearch / ov-seg
This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
☆736Updated 2 years ago
kohjingyu / gill
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆468Updated last year
isl-org / lang-seg
Language-Driven Semantic Segmentation
☆814Updated 10 months ago
NVlabs / GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
☆774Updated 3 years ago
OpenGVLab / VisionLLM
VisionLLM Series
☆1,122Updated 8 months ago
google-research / pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆930Updated 2 years ago
mlfoundations / datacomp
DataComp: In search of the next generation of multimodal datasets
☆745Updated 6 months ago
EvolvingLMMs-Lab / RelateAnything
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
☆455Updated 2 years ago
microsoft / RegionCLIP
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆798Updated last year
allenai / unified-io-2
☆629Updated last year
facebookresearch / LaViLa
Code release for "Learning Video Representations from Large Language Models"
☆537Updated 2 years ago
facebookresearch / VLPart
[ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation
☆387Updated 2 years ago
mlfoundations / wise-ft
Robust fine-tuning of zero-shot models
☆748Updated 3 years ago
kohjingyu / fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
☆483Updated 2 years ago
microsoft / GenerativeImage2Text
GIT: A Generative Image-to-text Transformer for Vision and Language
☆575Updated last year