stogiannidis / srbenchLinks
☆11Updated 2 weeks ago
Alternatives and similar repositories for srbench
Users that are interested in srbench are comparing it to the libraries listed below
Sorting:
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 8 months ago
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last year
- ☆69Updated 6 months ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆26Updated 7 months ago
- Egocentric Video Understanding Dataset (EVUD)☆29Updated 11 months ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆35Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆38Updated last year
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- ☆42Updated last year
- Code and data for "Does Spatial Cognition Emerge in Frontier Models?"☆14Updated last month
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 8 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- ☆43Updated 5 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆84Updated 11 months ago
- ☆36Updated last month
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆78Updated this week
- ☆29Updated 11 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 7 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆23Updated 6 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆59Updated 2 months ago
- ☆25Updated last year
- Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling @ CVPR22☆42Updated 2 years ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 6 months ago
- The offical implemention of JM3D.☆30Updated last month
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆56Updated 4 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆40Updated 6 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated this week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 6 months ago