top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆17Updated last month
Alternatives and similar repositories for SPARK:
Users that are interested in SPARK are comparing it to the libraries listed below
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 3 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆49Updated 4 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 7 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 3 weeks ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆33Updated 11 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆15Updated 4 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated 10 months ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆13Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 4 months ago
- ☆38Updated 3 months ago
- The official repo of continuous speculative decoding☆24Updated 3 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆20Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆41Updated 6 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- ☆12Updated 5 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆13Updated 7 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆15Updated 4 months ago
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation☆22Updated 3 weeks ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆40Updated 3 weeks ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆31Updated 3 months ago
- ☆24Updated last year
- ☆10Updated 3 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆12Updated 2 months ago
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆12Updated 7 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆16Updated 2 months ago
- Multi-vision Sensor Perception and Reasoning (MS-PR) benchmark, assessing VLMs on their capacity for sensor-specific reasoning.☆13Updated last month
- A curated list of papers and resources for text-to-image evaluation.☆27Updated last year
- LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆17Updated last month
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆34Updated 2 months ago