top-yun / MS-PRLinks
Multi-vision Sensor Perception and Reasoning (MS-PR) benchmark, assessing VLMs on their capacity for sensor-specific reasoning.
☆16Updated 4 months ago
Alternatives and similar repositories for MS-PR
Users that are interested in MS-PR are comparing it to the libraries listed below
Sorting:
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 8 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆93Updated last week
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- ☆68Updated last year
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆18Updated 6 months ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem”☆18Updated last month
- XmodelLM☆39Updated 7 months ago
- ☆61Updated 4 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆10Updated 7 months ago
- ☆19Updated 2 months ago
- VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models☆35Updated 3 months ago
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks☆48Updated 9 months ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆31Updated 8 months ago
- Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆75Updated 3 weeks ago
- 3D Traffic Light & Sign Dataset☆19Updated 3 months ago
- ☆56Updated 7 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆46Updated 4 months ago
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆34Updated last year
- ☆13Updated 7 months ago
- ☆25Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 11 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆34Updated last year
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆25Updated 3 weeks ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆24Updated last week
- Resa: Transparent Reasoning Models via SAEs☆39Updated last month
- ☆19Updated 4 months ago
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆33Updated last year
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated 11 months ago
- This is the offical page of WikiAutoGen, ICCV2025☆15Updated 3 weeks ago