sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆24Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for PVIT
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆19Updated last month
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- ☆12Updated last month
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆32Updated 8 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- ☆17Updated 4 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- Official implement of MIA-DPO☆40Updated 2 weeks ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- ☆13Updated 3 weeks ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆40Updated last month
- ☆27Updated last week
- [CVPR 2024 Highlight] ImageNet-D☆38Updated last month
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆26Updated 5 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆28Updated last week
- ☆36Updated last month
- The official repo of continuous speculative decoding☆16Updated this week
- ☆38Updated 4 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆26Updated 2 weeks ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆33Updated 3 months ago
- Video Diffusion State Space Models☆19Updated 7 months ago
- Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision"☆31Updated last week
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆39Updated 3 months ago
- Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"☆17Updated 8 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆23Updated 9 months ago
- FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax☆18Updated 11 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆41Updated 3 weeks ago