microsoft/VISOR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/VISOR)

microsoft / VISOR

☆46

Alternatives and similar repositories for VISOR

Users that are interested in VISOR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ConceptBed / evaluations
View on GitHub
[AAAI 2024] ConceptBed Evaluations for Personalized Text-to-Image Diffusion Models
☆25Jun 1, 2023Updated 3 years ago
tejas-gokhale / ALT
View on GitHub
☆13Dec 10, 2022Updated 3 years ago
Maitreyapatel / reliability-checklist
View on GitHub
NLP tool for wide-range model reliability evaluations
☆12Jun 18, 2023Updated 3 years ago
Attention-Refocusing / attention-refocusing
View on GitHub
☆133Jul 17, 2024Updated 2 years ago
duolu / CAROM
View on GitHub
CAROM - "CARs On the Map"
☆35Jun 12, 2026Updated last month
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
TonyLianLong / igligen
View on GitHub
Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation
☆46Jun 1, 2024Updated 2 years ago
jacobswan1 / SEED
View on GitHub
☆37Feb 1, 2022Updated 4 years ago
jacobswan1 / MTG-pytorch
View on GitHub
Gender/Age attribute grounding using weak supervised manner.
☆12Jun 23, 2019Updated 7 years ago
pinterest / atg-research
View on GitHub
☆74Sep 23, 2025Updated 9 months ago
hohonu-vicml / DirectedDiffusion
View on GitHub
Directed Diffusion: Direct Control of Object Placement through Attention Guidance (AAAI2024)
☆82Feb 22, 2024Updated 2 years ago
Xin-Ye-1 / HRL-GRG
View on GitHub
☆17Mar 26, 2021Updated 5 years ago
j-min / VPGen
View on GitHub
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆57Jul 25, 2023Updated 2 years ago
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
titu1994 / tf_GON
View on GitHub
Tensorflow 2.x implementation of Gradient Origin Networks
☆12Jul 13, 2020Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
1jsingh / Divide-Evaluate-and-Refine
View on GitHub
Repo for our NeurIPS 2023 paper on: Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Fee…
☆27Nov 11, 2023Updated 2 years ago
hassanhub / R3Transformer
View on GitHub
Official python implementation of R3-Transformer
☆15Nov 30, 2020Updated 5 years ago
zhiqi-li / WechatLogger
View on GitHub
一个mmcv 的logger hook, 可以用来把模型结果推送到微信上
☆21Oct 11, 2022Updated 3 years ago
mfarhadi / CNNIOT
View on GitHub
☆49May 21, 2018Updated 8 years ago
McGill-NLP / diffusion-itm
View on GitHub
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆33Mar 15, 2024Updated 2 years ago
j-min / DallEval
View on GitHub
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
☆143Jun 10, 2025Updated last year
ys-zong / MIRB
View on GitHub
Benchmarking Multi-Image Understanding in Vision and Language Models
☆11Jul 29, 2024Updated last year
bimsarapathiraja / refedit
View on GitHub
[ICCV 2025] Official Implementation of RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model for Referring …
☆20Jun 27, 2025Updated last year
zhenyuw16 / CompAgent_code
View on GitHub
Code release for our paper "Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation".
☆18Jan 30, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
linfeng93 / Large-UniDet
View on GitHub
A practice for million-scale multi-domain universal object detection
☆28Jun 13, 2024Updated 2 years ago
mapo-t2i / mapo
View on GitHub
Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).
☆83Jun 11, 2024Updated 2 years ago
Shentao-YANG / Dense_Reward_T2I
View on GitHub
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
☆39May 9, 2024Updated 2 years ago
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
AlvinWen428 / spatial-relation-benchmark
View on GitHub
☆15Oct 12, 2024Updated last year
McGill-NLP / AURORA
View on GitHub
Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
☆35Jun 30, 2025Updated last year
7zk1014 / PanoEnv
View on GitHub
☆15Jun 21, 2026Updated last month
VinAIResearch / tise-toolbox
View on GitHub
TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (ECCV 2022)
☆34Nov 12, 2024Updated last year
giuseppepastore10 / STRICT
View on GitHub
Official code for the paper: "A Closer Look at Self-training for Zero-Label Semantic Segmentation" https://arxiv.org/abs/2104.11692
☆25Aug 22, 2021Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
jacobswan1 / Video2Commonsense
View on GitHub
Video captioning baseline models on Video2Commonsense Dataset.
☆56Apr 15, 2021Updated 5 years ago
DavidMChan / clair
View on GitHub
CLAIR: A (surprisingly) simple semantic text metric with large language models.
☆22Jan 28, 2024Updated 2 years ago
liweitj47 / Attention-Heat-Map
View on GitHub
A script to draw attention heat map with matplotlib
☆14May 7, 2019Updated 7 years ago
ajd12342 / why-winoground-hard
View on GitHub
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31May 29, 2023Updated 3 years ago
shunk031 / training-free-structured-diffusion-guidance
View on GitHub
🤗 Unofficial huggingface/diffusers-based implementation of the paper "Training-Free Structured Diffusion Guidance for Compositional Text…
☆120Mar 29, 2023Updated 3 years ago
FocalNet / FocalNet-DINO
View on GitHub
This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO
☆68Mar 10, 2023Updated 3 years ago
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago