top-yun / MS-PRLinks

Multi-vision Sensor Perception and Reasoning (MS-PR) benchmark, assessing VLMs on their capacity for sensor-specific reasoning.

☆16

Alternatives and similar repositories for MS-PR

Users that are interested in MS-PR are comparing it to the libraries listed below

Sorting:

Hao840 / ADEM-VL
PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"
☆20Updated 8 months ago
HaroldChen19 / VistaDPO
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆27Updated last month
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆93Updated last week
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
XiaoduoAILab / XmodelVLM
☆68Updated last year
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆18Updated 6 months ago
TIGER-AI-Lab / One-Shot-CFT
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem”
☆18Updated last month
XiaoduoAILab / XmodelLM
XmodelLM
☆39Updated 7 months ago
SalesforceAIResearch / LATTE
☆61Updated 4 months ago
opencity3d / opencity3d
Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025
☆10Updated 7 months ago
EsmaeilNarimissa / aws-sft-grpo-budget-llm-finetune
☆19Updated 2 months ago
Event-AHU / VFM-Det
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
☆35Updated 3 months ago
prs-eth / LoRA-Ensemble
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks
☆48Updated 9 months ago
kkyuhun94 / dalda
[ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
☆31Updated 8 months ago
nickjiang2378 / test-time-registers
Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"
☆75Updated 3 weeks ago
aimotive / aimotive_tl_ts_dataset
3D Traffic Light & Sign Dataset
☆19Updated 3 months ago
PKU-YuanGroup / LLaVA-o1
☆56Updated 7 months ago
shulin16 / MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
☆46Updated 4 months ago
Gahyeonkim09 / AAPL
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)
☆34Updated last year
cyzus / thoughtsculpt
☆13Updated 7 months ago
XavierGrool / FGFusion
☆25Updated last year
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated 11 months ago
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆34Updated last year
OpenHelix-Team / CEED-VLA
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
☆25Updated 3 weeks ago
agents-x-project / PyVision
Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆24Updated last week
shangshang-wang / Resa
Resa: Transparent Reasoning Models via SAEs
☆39Updated last month
ZihanWang314 / coeCheck
☆19Updated 4 months ago
xinghaochen / SqueezeTime
Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"
☆33Updated last year
mfarre / Video-LLaVA-7B-hf-CinePile
Video-LlaVA fine-tune for CinePile evaluation
☆51Updated 11 months ago
01yzzyu / wikiautogen
This is the offical page of WikiAutoGen, ICCV2025
☆15Updated 3 weeks ago