alhojel / visual_task_vectors
☆33Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for visual_task_vectors
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆24Updated 4 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆104Updated 7 months ago
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆17Updated 3 weeks ago
- ☆30Updated 9 months ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆85Updated 2 weeks ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆87Updated 7 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated last month
- ☆14Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆28Updated last week
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆33Updated 4 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆19Updated 3 weeks ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- ☆21Updated 2 months ago
- ☆20Updated 3 weeks ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆40Updated 3 months ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 2 months ago
- More dimensions = More fun☆21Updated 3 months ago
- ☆52Updated last year
- ☆11Updated 4 months ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆69Updated 9 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 5 months ago
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model (CVPR 2023)☆82Updated last year
- Augmenting with Language-guided Image Augmentation (ALIA)☆62Updated last year
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- ☆18Updated 3 weeks ago