HumanEval-V / HumanEval-V-Benchmark
A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆6Updated 3 weeks ago
Alternatives and similar repositories for HumanEval-V-Benchmark:
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below
- ☆76Updated this week
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆90Updated 10 months ago
- ☆23Updated 5 months ago
- ☆25Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆58Updated last month
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆82Updated 8 months ago
- ☆59Updated 6 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆23Updated 3 months ago
- A Survey on the Honesty of Large Language Models☆56Updated 3 months ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆55Updated 2 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 4 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆56Updated 3 months ago
- Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)☆32Updated 8 months ago
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆20Updated last month
- e☆25Updated this week
- The official code repository for PRMBench.☆68Updated last month
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆53Updated 11 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 4 months ago
- ☆43Updated last month
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 3 months ago
- ☆28Updated 4 months ago
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆30Updated 8 months ago
- Codebase for decoding compressed trust.☆23Updated 10 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆33Updated 8 months ago
- ☆13Updated 8 months ago
- ☆32Updated 5 months ago