microsoft / vision-datasetsLinks
☆19Updated 5 months ago
Alternatives and similar repositories for vision-datasets
Users that are interested in vision-datasets are comparing it to the libraries listed below
Sorting:
- ☆65Updated last year
- REACT (CVPR 2023, Highlight 2.5%)☆138Updated 2 years ago
- [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222☆51Updated 2 years ago
- Language Quantized AutoEncoders☆109Updated 2 years ago
- ☆59Updated 2 years ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
- ☆84Updated 2 years ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated 9 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆23Updated 2 weeks ago
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurI…☆90Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆36Updated last year
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated 10 months ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆26Updated last week
- [CVPR 2024 Highlight] OpenBias: Open-set Bias Detection in Text-to-Image Generative Models☆25Updated 6 months ago
- research work on multimodal cognitive ai☆66Updated 2 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆78Updated 3 months ago
- Code for T-MARS data filtering☆35Updated 2 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆103Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- https://arxiv.org/abs/2209.15162☆52Updated 2 years ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 3 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆103Updated last year
- Code and data from the paper 'Human Feedback is not Gold Standard'☆19Updated last year
- ☆50Updated last year
- ☆86Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆83Updated 3 weeks ago
- Model Stock: All we need is just a few fine-tuned models☆122Updated 3 weeks ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 6 months ago