Computer-Vision-in-the-Wild/CVinW_Readings

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Computer-Vision-in-the-Wild/CVinW_Readings)

Computer-Vision-in-the-Wild / CVinW_Readings

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

☆1,372

Alternatives and similar repositories for CVinW_Readings

Users that are interested in CVinW_Readings are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,958Updated this week
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,257Jun 2, 2026Updated last month
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,950Aug 12, 2024Updated last year
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,031Jul 17, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DirtyHarryLYL / LLM-in-Vision
View on GitHub
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆871Mar 8, 2025Updated last year
ttengwang / Awesome_Prompting_Papers_in_Computer_Vision
View on GitHub
A curated list of prompt-based paper in computer vision and vision-language learning.
☆927Dec 18, 2023Updated 2 years ago
Computer-Vision-in-the-Wild / Elevater_Toolkit_IC
View on GitHub
Toolkit for Elevater Benchmark
☆78Oct 17, 2023Updated 2 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
IDEA-Research / awesome-detection-transformer
View on GitHub
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
☆1,399Jul 4, 2024Updated 2 years ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,671Feb 16, 2025Updated last year
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,687Sep 5, 2024Updated last year
KaiyangZhou / CoOp
View on GitHub
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
☆2,218May 20, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IDEA-Research / OpenSeeD
View on GitHub
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆763Jan 22, 2024Updated 2 years ago
IDEA-Research / GroundingDINO
View on GitHub
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆10,454Aug 12, 2024Updated last year
jianzongwu / Awesome-Open-Vocabulary
View on GitHub
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆999May 12, 2026Updated 2 months ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,116Aug 31, 2024Updated last year
awaisrauf / Awesome-CV-Foundational-Models
View on GitHub
☆550Nov 7, 2024Updated last year
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,795Aug 19, 2024Updated last year
VainF / Awesome-Anything
View on GitHub
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
☆1,852Nov 15, 2023Updated 2 years ago
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
yzhuoning / Awesome-CLIP
View on GitHub
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,229Jun 28, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
rom1504 / img2dataset
View on GitHub
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
☆4,438Oct 19, 2025Updated 9 months ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
microsoft / RegionCLIP
View on GitHub
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆816Mar 20, 2024Updated 2 years ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,009Nov 7, 2025Updated 8 months ago
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
gligen / GLIGEN
View on GitHub
Open-Set Grounded Text-to-Image Generation
☆2,226Mar 6, 2024Updated 2 years ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,849Nov 27, 2025Updated 8 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
IDEA-Research / detrex
View on GitHub
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
☆2,302Sep 11, 2025Updated 10 months ago
KMnP / vpt
View on GitHub
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
☆1,240Sep 2, 2023Updated 2 years ago
wusize / ovdet
View on GitHub
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
☆187Oct 25, 2023Updated 2 years ago
TheShadow29 / awesome-grounding
View on GitHub
awesome grounding: A curated list of research papers in visual grounding
☆1,126Sep 21, 2025Updated 10 months ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,501May 19, 2025Updated last year
jingyi0000 / VLM_survey
View on GitHub
Collection of AWESOME vision-language models for vision tasks
☆3,130Oct 14, 2025Updated 9 months ago
openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆34,083Mar 25, 2026Updated 4 months ago