UX-Decoder / FINDLinks

[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"

☆128

Alternatives and similar repositories for FIND

Users that are interested in FIND are comparing it to the libraries listed below

Sorting:

OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆90Updated 5 months ago
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆181Updated 7 months ago
bytedance / OmniScient-Model
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆98Updated last year
TencentARC / ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
☆184Updated 10 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆62Updated 4 months ago
showlab / sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
☆75Updated last year
V3Det / V3Det
☆114Updated last year
aliasgharkhani / SLiMe
1-shot image segmentation using Stable Diffusion
☆142Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆101Updated 8 months ago
see-say-segment / sesame
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆45Updated last year
Jiahao000 / MosaicFusion
[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
☆128Updated last year
YuchenLiu98 / COMM
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆205Updated 10 months ago
renwang435 / video-ttt-release
Test-Time Training on Video Streams
☆64Updated 2 years ago
lambert-x / ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆56Updated 3 months ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆129Updated 3 weeks ago
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆134Updated 7 months ago
janghyuncho / DECOLA
Code release for "Language-conditioned Detection Transformer"
☆88Updated last year
AILab-CVC / VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Updated last year
facebookresearch / EgoObjects
[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
☆77Updated 2 years ago
mightyzau / RegionBLIP
☆58Updated 2 years ago
congvvc / LaSagnA
Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".
☆61Updated last year
wysoczanska / clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
☆263Updated last year
isekai-portal / Link-Context-Learning
☆100Updated last year
xk-huang / segment-caption-anything
[CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆231Updated last year
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆158Updated 11 months ago
facebookresearch / maws
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
☆91Updated 7 months ago
haochenheheda / LVVIS
Large-Vocabulary Video Instance Segmentation dataset
☆95Updated last year
zhouyiks / CoLVA
☆40Updated 4 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 9 months ago