google-research / silc
[ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation
☆35Updated last month
Related projects ⓘ
Alternatives and complementary repositories for silc
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆84Updated 3 weeks ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- ☆33Updated 3 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆105Updated 7 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆24Updated 4 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- Matryoshka Multimodal Models☆81Updated last month
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆54Updated 3 weeks ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆95Updated 2 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆93Updated 5 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 5 months ago
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆52Updated 4 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆87Updated 7 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- ☆29Updated 3 weeks ago
- Official Repository of Personalized Visual Instruct Tuning☆23Updated last week
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆68Updated 5 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆38Updated 3 months ago
- ☆20Updated 3 weeks ago
- [CVPR 2024 Highlight] ImageNet-D☆38Updated 3 weeks ago
- ☆12Updated 3 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆16Updated last month
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆45Updated last month
- More dimensions = More fun☆21Updated 3 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago