JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆32Updated 6 months ago
Alternatives and similar repositories for RATIONALYST:
Users that are interested in RATIONALYST are comparing it to the libraries listed below
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆53Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆42Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Exploration of automated dataset selection approaches at large scales.☆39Updated last month
- ☆24Updated 7 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆34Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- ☆15Updated 2 weeks ago
- ☆60Updated 11 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 4 months ago
- ☆41Updated 3 weeks ago
- The repository contains code for Adaptive Data Optimization☆23Updated 4 months ago
- Revisiting Mid-training in the Era of RL Scaling☆27Updated this week
- ☆78Updated 8 months ago
- ☆48Updated 5 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆25Updated 4 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- ☆66Updated last month
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 3 weeks ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆80Updated 3 weeks ago