bethgelab / frequency_determines_performance
Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIPS'24]
☆75Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for frequency_determines_performance
- Code for T-MARS data filtering☆35Updated last year
- Sparse Linear Concept Embeddings☆69Updated 3 months ago
- ☆38Updated 3 months ago
- Patching open-vocabulary models by interpolating weights☆90Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆69Updated 9 months ago
- Language Quantized AutoEncoders☆94Updated last year
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆96Updated 2 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆36Updated last month
- ☆43Updated last year
- Recursive Visual Programming☆16Updated this week
- ☆30Updated 9 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- Holistic evaluation of multimodal foundation models☆41Updated 3 months ago
- Matryoshka Multimodal Models☆82Updated this week
- What do we learn from inverting CLIP models?☆45Updated 8 months ago
- ☆20Updated last month
- ☆33Updated 4 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆62Updated 3 months ago
- ☆30Updated this week
- ☆64Updated 4 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆34Updated this week
- ☆29Updated 2 years ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆25Updated 4 months ago