UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆110Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for FIND
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆85Updated 2 weeks ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆164Updated 4 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 3 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆87Updated 7 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆95Updated last month
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆115Updated last month
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆59Updated 2 weeks ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆104Updated 7 months ago
- ☆99Updated 4 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆112Updated last month
- ☆64Updated 4 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Official repo for StableLLAVA☆90Updated 10 months ago
- Matryoshka Multimodal Models☆81Updated last month
- Multimodal Video Understanding Framework (MVU)☆23Updated 5 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- ☆103Updated 3 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆120Updated 4 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆64Updated this week
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆186Updated 9 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last month
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆196Updated last month
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆123Updated 2 months ago
- ☆88Updated 5 months ago
- ☆145Updated 3 weeks ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago