microsoft / vision-datasetsLinks
☆19Updated 7 months ago
Alternatives and similar repositories for vision-datasets
Users that are interested in vision-datasets are comparing it to the libraries listed below
Sorting:
- ☆29Updated 2 years ago
 - [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222☆51Updated 2 years ago
 - ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
 - Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆22Updated last week
 - ☆65Updated 2 years ago
 - ☆60Updated 2 years ago
 - Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated last year
 - [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated 2 years ago
 - REACT (CVPR 2023, Highlight 2.5%)☆139Updated 2 years ago
 - Code for T-MARS data filtering☆35Updated 2 years ago
 - A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Updated 10 months ago
 - Command-line tool for downloading and extending the RedCaps dataset.☆49Updated last year
 - Repository for the paper Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning☆36Updated 2 years ago
 - An official PyTorch implementation for CLIPPR☆29Updated 2 years ago
 - research work on multimodal cognitive ai☆67Updated 4 months ago
 - SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)☆17Updated last year
 - Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Updated last year
 - Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
 - Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆25Updated 9 months ago
 - Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Updated 3 years ago
 - Language Quantized AutoEncoders☆110Updated 2 years ago
 - [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Updated 2 years ago
 - Codebase for adaptive continual memory☆13Updated 2 years ago
 - SMILE: A Multimodal Dataset for Understanding Laughter☆12Updated 2 years ago
 - Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆106Updated last year
 - ☆135Updated 2 months ago
 - In this codebase we establish a benchmark for egocentric user adaptation based on Ego4d.First, we start from a population model which ha…☆15Updated 9 months ago
 - ☆24Updated 2 years ago
 - Data-Efficient Multimodal Fusion on a Single GPU☆67Updated last year
 - Patching open-vocabulary models by interpolating weights☆91Updated 2 years ago