Jiahao000 / VICTLinks
[CVPR 2025] Test-Time Visual In-Context Tuning
☆25Updated 9 months ago
Alternatives and similar repositories for VICT
Users that are interested in VICT are comparing it to the libraries listed below
Sorting:
- ☆41Updated 5 months ago
- Visual Spatial Tuning☆157Updated this week
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆166Updated 2 weeks ago
- [CVPR 2024 Highlight] ImageNet-D☆46Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆46Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects☆56Updated last year
- ☆65Updated last month
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆35Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 10 months ago
- ReNeg: Learning Negative Embedding with Reward Guidance☆35Updated last week
- ☆35Updated last week
- Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation (ICCV 2023)☆66Updated 2 years ago
- ☆39Updated 2 years ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation☆37Updated 2 years ago
- ☆43Updated 7 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 6 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆94Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆89Updated last year
- Test-Time Training on Video Streams☆66Updated 2 years ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆161Updated 2 weeks ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆119Updated 2 months ago
- SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality☆35Updated last year
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆28Updated last year
- Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)☆57Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 11 months ago
- Official implementation of LaVin-DiT☆49Updated 11 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆99Updated last year
- ☆58Updated 2 years ago