Fantasyele / LLaVA-KD
☆37Updated this week
Related projects ⓘ
Alternatives and complementary repositories for LLaVA-KD
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated this week
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆90Updated this week
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 3 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆69Updated 2 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆35Updated this week
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆25Updated 6 months ago
- ☆56Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆33Updated last week
- ☆99Updated 5 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- ☆21Updated last month
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆167Updated 9 months ago
- Official PyTorch implementation of TrackDiffusion (https://arxiv.org/abs/2312.00651)☆63Updated 4 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆24Updated last month
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆63Updated 2 months ago
- DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆113Updated 5 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆59Updated last month
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆45Updated 3 months ago
- ☆29Updated 7 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 3 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆116Updated last month
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆42Updated last week
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆65Updated 4 months ago
- ☆57Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆26Updated 4 months ago
- Official PyTorch implementation of GeoDiffusion in ICLR 2024 (https://arxiv.org/abs/2306.04607)☆63Updated 2 weeks ago
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"☆78Updated 8 months ago
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆53Updated 3 weeks ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated 2 weeks ago