dvlab-research / MR-GSM8K
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
☆41Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for MR-GSM8K
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- ☆62Updated last month
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆78Updated this week
- ☆116Updated 5 months ago
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆38Updated 3 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆72Updated 3 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated last month
- FuseAI Project☆76Updated 2 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 6 months ago
- This is the official repository for Inheritune.☆105Updated last month
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆61Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- Reformatted Alignment☆112Updated last month
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆102Updated 6 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆73Updated 9 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆64Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆194Updated last week
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 8 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆56Updated 5 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆111Updated last week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆57Updated last month
- ☆49Updated 6 months ago