microsoft / vision-datasetsLinks
☆19Updated 8 months ago
Alternatives and similar repositories for vision-datasets
Users that are interested in vision-datasets are comparing it to the libraries listed below
Sorting:
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
- ☆60Updated 2 years ago
- ☆65Updated 2 years ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Updated last year
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆46Updated 2 years ago
- REACT (CVPR 2023, Highlight 2.5%)☆140Updated 2 years ago
- [CVPR 2024 Highlight] OpenBias: Open-set Bias Detection in Text-to-Image Generative Models☆26Updated 9 months ago
- SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)☆18Updated last year
- ☆29Updated 2 years ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆13Updated 2 years ago
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurI…☆92Updated last year
- ☆83Updated 2 years ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated last year
- ☆87Updated last year
- ☆30Updated 2 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated 2 years ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆79Updated 6 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆24Updated last month
- Command-line tool for downloading and extending the RedCaps dataset.☆50Updated last year
- ☆35Updated last year
- Code for T-MARS data filtering☆35Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆21Updated 2 years ago
- [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222☆51Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated 2 years ago
- Methods for creating saliency maps for computer vision models.☆43Updated last year
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Updated last year
- ☆50Updated 2 years ago
- Language Quantized AutoEncoders☆111Updated 2 years ago
- https://arxiv.org/abs/2209.15162☆53Updated 2 years ago