facebookresearch / SIEVE
SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)
☆14Updated 8 months ago
Alternatives and similar repositories for SIEVE:
Users that are interested in SIEVE are comparing it to the libraries listed below
- ☆21Updated 3 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆20Updated last year
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆28Updated 8 months ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- ☆37Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 7 months ago
- A curated list of papers and resources for text-to-image evaluation.☆26Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆16Updated 3 weeks ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆37Updated last year
- Code for T-MARS data filtering☆35Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆17Updated 6 months ago
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆13Updated 2 weeks ago
- (ICLR 2024, CVPR 2024) SparseFormer☆67Updated 2 months ago
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 2 months ago
- [ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…☆21Updated last month
- Command-line tool for downloading and extending the RedCaps dataset.☆46Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆22Updated this week
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated 2 years ago
- Data-Efficient Multimodal Fusion on a Single GPU☆51Updated 8 months ago
- ☆11Updated 5 months ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18Updated last year
- Codebase for adaptive continual memory☆13Updated last year
- [CVPRW'23] The official PyTorch implementation of NamedMask☆23Updated last year
- This repo contains code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation"☆10Updated last week
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- https://arxiv.org/abs/2209.15162☆48Updated last year
- ☆22Updated 3 months ago