☆46Oct 27, 2023Updated 2 years ago
Alternatives and similar repositories for VISOR
Users that are interested in VISOR are comparing it to the libraries listed below
Sorting:
- [WACV 2024] Training-Free Layout Control with Cross-Attention Guidance☆266Mar 18, 2024Updated last year
- [AAAI 2024] ConceptBed Evaluations for Personalized Text-to-Image Diffusion Models☆25Jun 1, 2023Updated 2 years ago
- [ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"☆13Aug 6, 2024Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- [ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap☆12Jun 18, 2025Updated 8 months ago
- ☆13Dec 10, 2022Updated 3 years ago
- [CVPR2023] This is an official mmdet implementation of paper "DETRs with Hybrid Matching".☆49Jan 14, 2023Updated 3 years ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated last year
- NLP tool for wide-range model reliability evaluations☆12Jun 18, 2023Updated 2 years ago
- A practice for million-scale multi-domain universal object detection☆28Jun 13, 2024Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆57Jul 25, 2023Updated 2 years ago
- ☆14Oct 12, 2024Updated last year
- reproduces experiments from "Grounding inductive biases in natural images: invariance stems from variations in data"☆17Sep 25, 2024Updated last year
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆21Jan 28, 2024Updated 2 years ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆40May 9, 2024Updated last year
- Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).☆82Jun 11, 2024Updated last year
- Directed Diffusion: Direct Control of Object Placement through Attention Guidance (AAAI2024)☆81Feb 22, 2024Updated 2 years ago
- ☆24Nov 29, 2023Updated 2 years ago
- 一个mmcv 的logger hook, 可以用来把模型结果推送到微信上☆21Oct 11, 2022Updated 3 years ago
- PyTorch implementation of Refine and Represent: Region-to-Object Representation Learning.☆21Jun 19, 2025Updated 8 months ago
- Up-to-date Vision Language Models collection. Mainly focus on computer vision☆19Feb 9, 2023Updated 3 years ago
- Official implementation of the paper "Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synth…☆93Oct 2, 2023Updated 2 years ago
- Official code for the paper: "A Closer Look at Self-training for Zero-Label Semantic Segmentation" https://arxiv.org/abs/2104.11692☆25Aug 22, 2021Updated 4 years ago
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆24Aug 15, 2022Updated 3 years ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated last year
- My personal toolbox for doing datascience (especially deep learning) in python.☆18Mar 21, 2020Updated 5 years ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Jul 10, 2023Updated 2 years ago
- Repo for our NeurIPS 2023 paper on: Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Fee…☆27Nov 11, 2023Updated 2 years ago
- Code and data for "Does Spatial Cognition Emerge in Frontier Models?"☆27Apr 18, 2025Updated 10 months ago
- ☆24Oct 9, 2023Updated 2 years ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 2 years ago
- This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos, which has bee…☆53Apr 14, 2023Updated 2 years ago
- [Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation☆330Dec 24, 2025Updated 2 months ago
- [NeurIPS 2022 Spotlight] Learning Equivariant Segmentation with Instance-Unique Querying☆22Dec 17, 2022Updated 3 years ago
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 2 years ago
- 🔥 [CVPR2024] Official implementation of "Self-correcting LLM-controlled Diffusion Models (SLD)☆187Apr 9, 2024Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆33Jun 30, 2025Updated 7 months ago
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆34Nov 12, 2025Updated 3 months ago
- The SVO-Probes Dataset for Verb Understanding☆31Jan 28, 2022Updated 4 years ago