[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
☆33Jun 12, 2026Updated this week
Alternatives and similar repositories for Polos
Users that are interested in Polos are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- This is the official implementation of the paper "MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision…☆32Mar 12, 2024Updated 2 years ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆93Feb 13, 2024Updated 2 years ago
- Data release for the ImageInWords (IIW) paper.☆225Nov 17, 2024Updated last year
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ECCV24] Layer-Wise Relevance Propagation with Conservation Property for ResNet☆15Sep 20, 2024Updated last year
- [IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives☆35Nov 25, 2025Updated 6 months ago
- PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021☆24Jun 4, 2021Updated 5 years ago
- This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language M…☆24Apr 27, 2025Updated last year
- Code Repository for CausalDiffAE (ECAI 2024)☆23Oct 19, 2024Updated last year
- This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described …☆71Dec 20, 2021Updated 4 years ago
- Related papers about Referring Image Segmentation (RIS)☆16Dec 26, 2023Updated 2 years ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆141Dec 16, 2025Updated 6 months ago
- ☆47Aug 26, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurI…☆94Apr 29, 2024Updated 2 years ago
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆293Jun 7, 2023Updated 3 years ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆115Mar 21, 2025Updated last year
- (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"☆31Nov 21, 2021Updated 4 years ago
- [ICME 2019] Source code and datasets for "Semi-supervised Compatibility Learning Across Categories for Clothing Matching"☆11Apr 26, 2024Updated 2 years ago
- final-project-level3-nlp-02 created by GitHub Classroom☆11Dec 31, 2021Updated 4 years ago
- ☆17Nov 4, 2022Updated 3 years ago
- PyTorch implementation of "PatchVAE: Learning Local Latent Codes for Recognition" to appear in CVPR 2020☆14Apr 9, 2020Updated 6 years ago
- Code for DVD A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue☆14Oct 12, 2021Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ACL 2023 Findings] FACTUAL dataset, the textual scene graph parser trained on FACTUAL.☆128Updated this week
- ☆11Oct 2, 2024Updated last year
- M-HalDetect Dataset Release☆29Nov 4, 2023Updated 2 years ago
- ☆15May 13, 2024Updated 2 years ago
- A list of papers and other resources on language-guided image editing.☆39Jan 13, 2021Updated 5 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Aug 8, 2024Updated last year
- [ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs☆20Nov 16, 2025Updated 7 months ago
- Official Source code of "One-Shot Adaptation of GAN in Just One CLIP" IEEE Transactions on Pattern Anaylsis and Machine Intelligence (TPA…☆66Jun 5, 2023Updated 3 years ago
- ☆11Oct 9, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A currency rate converter App.☆15Sep 5, 2019Updated 6 years ago
- (ICLR 2021) ConstellationNet: Attentional Constellation Nets for Few-Shot Learning☆14Apr 4, 2022Updated 4 years ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆103Oct 23, 2024Updated last year
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- CVPR2023: Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment☆14May 19, 2023Updated 3 years ago
- The official implementations of Noise-Informed Diffusion-Generated Image Detection With Anomaly Attention (TIFS 2025)☆17Jun 23, 2025Updated 11 months ago
- port https://github.com/ChenWu98/cycle-diffusion to run on https://github.com/AUTOMATIC1111/stable-diffusion-webui☆13Oct 22, 2022Updated 3 years ago