KAIST-Visual-AI-Group / APC-VLM
Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆21Updated last week
Alternatives and similar repositories for APC-VLM:
Users that are interested in APC-VLM are comparing it to the libraries listed below
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆46Updated 5 months ago
- ☆33Updated 2 months ago
- ☆14Updated 3 weeks ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆16Updated 2 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆16Updated 6 months ago
- ☆14Updated 6 months ago
- Official implementation of "Reangle-A-Video: 4D Video Generation as Video-to-Video Translation"☆38Updated last month
- Official pytorch implementation of "SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering"☆30Updated last month
- The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (N…☆50Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆17Updated 6 months ago
- The official code for MedAgent_Pro☆19Updated 2 weeks ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆35Updated 10 months ago
- Official implementation of "URECA : Unique Region Caption Anything"☆42Updated 3 weeks ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 10 months ago
- Code for IterInpaint model, presented in Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (CVPR 2024 work…☆25Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆39Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 2 months ago
- ☆23Updated 6 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆37Updated 7 months ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆18Updated last month
- Official implementation of the WACV 2025 paper "3D Part Segmentation via Geometric Aggregation of 2D Visual Features"☆17Updated last month
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 5 months ago
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆19Updated last month
- [CVPR 2024 Highlight] ImageNet-D☆42Updated 6 months ago
- ☆28Updated 3 months ago
- [ECCV 2024] R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations☆10Updated 9 months ago
- ☆14Updated last year
- ☆10Updated 3 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆22Updated 6 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆39Updated 2 months ago