Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
☆52Jul 10, 2024Updated last year
Alternatives and similar repositories for MR-GSM8K
Users that are interested in MR-GSM8K are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆51Oct 31, 2024Updated last year
- [CVPR 2024] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding☆44Mar 15, 2024Updated 2 years ago
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆21May 29, 2024Updated 2 years ago
- Evaluate the Quality of Critique☆37Jun 1, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13May 5, 2024Updated 2 years ago
- Data and code for APPDIA: A Discourse-aware Transformer-based Style Transfer Model for Offensive Social Media Conversations (COLING 2022)…☆13Sep 8, 2022Updated 3 years ago
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated last month
- [SIGIR 2024] This is the official PyTorch implementation for the paper: "EulerFormer: Sequential User Behavior Modeling with Complex Vect…☆11Oct 1, 2024Updated last year
- [NeurIPS 2025] Training-Free Efficient Video Generation via Dynamic Token Carving☆279Aug 4, 2025Updated 10 months ago
- Extracting Cultural Commonsense Knowledge at Scale (WWW 2023)☆11Feb 15, 2024Updated 2 years ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆46Jan 11, 2024Updated 2 years ago
- Knowledge Graph-augmented NMT☆11Sep 20, 2021Updated 4 years ago
- Explore how Flux Dev responds when you change the strengths of layers in the model.☆21Sep 20, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆83Apr 18, 2024Updated 2 years ago