Essential-AI / reflection
☆38Updated last month
Alternatives and similar repositories for reflection
Users that are interested in reflection are comparing it to the libraries listed below
Sorting:
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆61Updated 5 months ago
- ☆63Updated last week
- Revisiting Mid-training in the Era of RL Scaling☆37Updated 3 weeks ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- ☆59Updated 8 months ago
- Exploration of automated dataset selection approaches at large scales.☆40Updated 2 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆42Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆95Updated last week
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- Large Language Models Can Self-Improve in Long-context Reasoning☆69Updated 5 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆49Updated 6 months ago
- The official repository of the Omni-MATH benchmark.☆82Updated 4 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- This the implementation of LeCo☆31Updated 3 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆134Updated 7 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆21Updated 4 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- The code and data for the paper JiuZhang3.0☆44Updated 11 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 2 months ago
- ☆29Updated 4 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 5 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆51Updated 11 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated 3 weeks ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 5 months ago
- ☆45Updated 6 months ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆135Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- ☆45Updated last month
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆84Updated 7 months ago