An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆46Nov 17, 2023Updated 2 years ago
Alternatives and similar repositories for ReForm-Eval
Users that are interested in ReForm-Eval are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated last year
- A collection of resources that investigate social agents.☆219Apr 22, 2025Updated 10 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- ☆23Oct 30, 2025Updated 4 months ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆19Mar 11, 2024Updated last year
- [NeurIPS 2024] PyTorch code for the paper "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning…☆23Oct 24, 2025Updated 4 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- ☆66Feb 5, 2024Updated 2 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- ☆21Oct 10, 2023Updated 2 years ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Nov 10, 2023Updated 2 years ago
- ☆54Mar 19, 2025Updated 11 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆23Oct 15, 2024Updated last year
- ☆92Nov 25, 2023Updated 2 years ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆107Aug 21, 2025Updated 6 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- ☆27Jul 20, 2024Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆329Oct 14, 2025Updated 4 months ago
- ☆55Apr 1, 2024Updated last year
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆28May 28, 2024Updated last year
- ☆30May 22, 2024Updated last year
- ☆101Dec 22, 2023Updated 2 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆58Sep 26, 2024Updated last year
- An Easy-to-use Hallucination Detection Framework for LLMs.☆63Apr 21, 2024Updated last year
- ☆32Feb 8, 2024Updated 2 years ago
- ☆32Mar 25, 2024Updated last year
- [ECCV2024] Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajector…☆30Nov 15, 2025Updated 3 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37May 23, 2023Updated 2 years ago
- Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…☆11Aug 29, 2023Updated 2 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Jan 15, 2024Updated 2 years ago
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆165Dec 27, 2023Updated 2 years ago
- Introduction to Machine Learning using scikit-learn and PyTorch☆10Sep 26, 2019Updated 6 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- ☆15Feb 12, 2026Updated 2 weeks ago
- ☆11Jun 18, 2023Updated 2 years ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago