This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
☆51Oct 31, 2024Updated last year
Alternatives and similar repositories for Mr-Ben
Users that are interested in Mr-Ben are comparing it to the libraries listed below
Sorting:
- ☆17Jul 12, 2025Updated 7 months ago
- ☆33Jun 24, 2024Updated last year
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- ☆33Sep 14, 2025Updated 5 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- ☆35Jan 10, 2025Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆20Oct 2, 2024Updated last year
- ☆25Aug 23, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Nov 24, 2024Updated last year
- ☆20Nov 20, 2024Updated last year
- ☆16Nov 26, 2024Updated last year
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆33Feb 6, 2026Updated 3 weeks ago
- Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows"☆38Nov 10, 2025Updated 3 months ago
- Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.☆32Nov 5, 2025Updated 4 months ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated 11 months ago
- ☆22Jan 29, 2026Updated last month
- [ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms☆36Jun 4, 2025Updated 9 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆71Jul 11, 2023Updated 2 years ago
- ☆20Nov 3, 2024Updated last year
- ☆56Feb 11, 2026Updated 3 weeks ago
- Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)☆19Nov 5, 2024Updated last year
- Knowledge Distillation Toolbox for Semantic Segmentation☆17Nov 20, 2022Updated 3 years ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆47Aug 13, 2025Updated 6 months ago
- [CVPR2025] Official Implementations "One-Way Ticket : Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models"☆28Jul 28, 2025Updated 7 months ago
- AutoHallusion Codebase (EMNLP 2024)☆22Dec 6, 2024Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆153Sep 21, 2024Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Jan 21, 2025Updated last year
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Aug 24, 2025Updated 6 months ago
- ☆51Oct 28, 2024Updated last year
- ☆27Jul 11, 2024Updated last year
- The code and data for ACL2021 paper <Can Generative Pre-trained Language Models Serve as Knowledge Bases for Closed-book QA?>☆22Dec 18, 2022Updated 3 years ago
- ☆24Feb 16, 2025Updated last year
- ☆26Nov 21, 2022Updated 3 years ago
- ☆28Nov 10, 2025Updated 3 months ago
- ☆22Oct 21, 2024Updated last year