OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Learners"
☆96Updated 4 months ago
Related projects: ⓘ
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- ☆93Updated 3 months ago
- ☆52Updated last year
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆47Updated 9 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆88Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆85Updated last week
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆84Updated 5 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆92Updated last week
- ☆57Updated last year
- ☆100Updated last month
- ☆40Updated 3 months ago
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆86Updated last month
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆47Updated 2 months ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆72Updated 9 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆73Updated 2 months ago
- ☆17Updated 5 months ago
- ☆31Updated 3 months ago
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆160Updated 6 months ago
- ☆36Updated 4 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.☆168Updated 2 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆85Updated 6 months ago
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆152Updated 2 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- [CVPR'23] Hard Patches Mining for Masked Image Modeling☆86Updated 8 months ago
- 【ECCV2024】The official repo of Griffon series☆93Updated 2 months ago