JIA-Lab-research/Mr-Ben

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JIA-Lab-research/Mr-Ben)

JIA-Lab-research / Mr-Ben

This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"

☆51

Alternatives and similar repositories for Mr-Ben

Users that are interested in Mr-Ben are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rookie-joe / FormalAlign
View on GitHub
☆17Jul 12, 2025Updated last year
SparksofAGI / MHPP
View on GitHub
☆35Sep 14, 2025Updated 10 months ago
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
rookie-joe / PDA
View on GitHub
☆36Jan 10, 2025Updated last year
pillowsofwind / Course-Correction
View on GitHub
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
☆20Oct 2, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
hanxuhu / SeqIns
View on GitHub
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆30Nov 24, 2024Updated last year
liyucheng09 / Contamination_Detector
View on GitHub
Lightweight tool to identify Data Contamination in LLMs evaluation
☆53Mar 8, 2024Updated 2 years ago
rookie-joe / AutoPSV
View on GitHub
☆50Oct 28, 2024Updated last year
LCO-Embedding / LCO-Embedding
View on GitHub
[NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning
☆48Apr 13, 2026Updated 3 months ago
pillowsofwind / Knowledge-Conflicts-Survey
View on GitHub
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆159Sep 21, 2024Updated last year
Yingjia-Wan / FaStfact
View on GitHub
Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.
☆33Nov 5, 2025Updated 8 months ago
princeton-nlp / ELIZA-Transformer
View on GitHub
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆23Feb 9, 2025Updated last year
zzli2022 / TLDR
View on GitHub
Code for Research Project TLDR
☆26Jul 28, 2025Updated 11 months ago
zhaoxlpku / SubgoalXL
View on GitHub
☆26Aug 23, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MARIO-Math-Reasoning / MARIO
View on GitHub
☆28May 8, 2024Updated 2 years ago
QiushiSun / Awesome-Code-Intelligence
View on GitHub
Neural Code Intelligence Survey 2024-25; Reading lists and resources
☆281Jul 24, 2025Updated last year
pillowsofwind / llms-believe-the-earth-is-flat
View on GitHub
[ACL 2024] The official GitHub repo for the paper "The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Pe…
☆82Jul 19, 2024Updated 2 years ago
john-hewitt / implicit-ins
View on GitHub
Codebase for Instruction Following without Instruction Tuning
☆36Sep 24, 2024Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
wangcunxiang / Can-PLM-Serve-as-KB-for-CBQA
View on GitHub
The code and data for ACL2021 paper <Can Generative Pre-trained Language Models Serve as Knowledge Bases for Closed-book QA?>
☆22Dec 18, 2022Updated 3 years ago
filtir / awesome-AI-fact-checking
View on GitHub
A collection of papers tackling automatic fact-checking (particularly of AI-generated content)
☆13Nov 3, 2023Updated 2 years ago
XunhaoLai / native-sparse-attention-triton
View on GitHub
Efficient triton implementation of Native Sparse Attention.
☆284May 23, 2025Updated last year
chuanyangjin / MMToM-QA
View on GitHub
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
☆159Jun 28, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chang-github-00 / Predictive-Decoding
View on GitHub
Repo for Anonymous purpose, pls don't distribute
☆10Oct 2, 2024Updated last year
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
JIA-Lab-research / Q-LLM
View on GitHub
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆54Jul 16, 2024Updated 2 years ago
naszilla / nas-encodings
View on GitHub
Encodings for neural architecture search
☆29Apr 5, 2021Updated 5 years ago
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
maxreciprocate / offline
View on GitHub
Offline RL experiments
☆15Oct 1, 2022Updated 3 years ago
stas00 / python-tools
View on GitHub
Python tools
☆14Oct 22, 2023Updated 2 years ago
sterzhang / image-textualization
View on GitHub
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆172Jul 30, 2024Updated last year
YihongDong / CDD-TED4LLMs
View on GitHub
☆16Nov 26, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
xuefeng-li1 / Provably-end-to-end-label-noise-learning-without-anchor-points
View on GitHub
☆15Jun 9, 2021Updated 5 years ago
Delikitty / Computer-Vision-16720-CMU
View on GitHub
☆14Jul 30, 2017Updated 8 years ago
xufangzhi / NLP_HW2
View on GitHub
The second Homework of NLP
☆13Jun 9, 2021Updated 5 years ago
xufangzhi / MoCA
View on GitHub
[Pattern Recognition] The implementation of MoCA
☆12Apr 1, 2023Updated 3 years ago
CogComp / TAWT
View on GitHub
Weighted Training for Cross-Task Learning
☆15Feb 12, 2023Updated 3 years ago
pipilurj / BONAS
View on GitHub
☆35Sep 14, 2021Updated 4 years ago
cambridgeltl / PairS
View on GitHub
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)
☆49Jan 21, 2025Updated last year