umd-huang-lab / perceptionCLIPView external linksLinks
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79May 5, 2024Updated last year
Alternatives and similar repositories for perceptionCLIP
Users that are interested in perceptionCLIP are comparing it to the libraries listed below
Sorting:
- ☆26Aug 1, 2023Updated 2 years ago
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13May 25, 2023Updated 2 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Apr 27, 2024Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Generating Image Specific Text☆29Aug 14, 2023Updated 2 years ago
- [ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without F…☆284Sep 28, 2023Updated 2 years ago
- [IJCV] PyTorch implementation of "Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation"☆18Oct 25, 2023Updated 2 years ago
- [IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation☆38Feb 14, 2024Updated 2 years ago
- [ICCV 2023] AlignDet: Aligning Pre-training and Fine-tuning in Object Detection.☆146Sep 26, 2023Updated 2 years ago
- ☆22Jun 30, 2023Updated 2 years ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆247Jan 17, 2024Updated 2 years ago
- [CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)☆44Sep 5, 2023Updated 2 years ago
- ☆46Aug 7, 2025Updated 6 months ago
- 神经辐射场 论文学习☆10Sep 25, 2021Updated 4 years ago
- ☆14Jan 5, 2022Updated 4 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Sep 10, 2023Updated 2 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆288Jan 14, 2024Updated 2 years ago
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- Estimate the global pose of one point-set (or graph) relative to another, using cuda☆11Apr 5, 2017Updated 8 years ago
- ☆19Jul 31, 2025Updated 6 months ago
- ego-slam☆11Jun 10, 2019Updated 6 years ago
- Relocalization algorithm☆10Sep 15, 2014Updated 11 years ago
- [ICML2022] "Identity-Disentangled Adversarial Augmentation for Self-Supervised Learning"☆10Jul 24, 2022Updated 3 years ago
- Codebase for Mechanistic Mode Connectivity☆13Jul 14, 2023Updated 2 years ago
- A Residual Network Design with less than 5 million trainable parameters achieving an accuracy of 96.04% on CIFAR-10.☆27Jul 23, 2024Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated last year
- [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222☆53Jun 12, 2023Updated 2 years ago
- Official PyTorch codes for "Enhancing Diffusion Models with Text-Encoder Reinforcement Learning", ECCV2024☆57Aug 13, 2024Updated last year
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated last year
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Sep 24, 2023Updated 2 years ago
- TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering☆181Apr 29, 2024Updated last year
- Adaptive Inter-Class Similarity Distillation for Semantic Segmentation (MTAP 2025)☆29Nov 14, 2025Updated 3 months ago
- 📍 Official repository of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS 2023)☆55Nov 8, 2023Updated 2 years ago
- g2o_frontend: a simple structure for handling slam with unknown data association by using the g2o backend optimization and its extensions☆15Jun 12, 2014Updated 11 years ago
- Using Gradio interface to build UI for converting text to speech☆13Jan 26, 2021Updated 5 years ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆34Sep 16, 2023Updated 2 years ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 2 years ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Jun 27, 2023Updated 2 years ago