protagolabs / odyssey-mathLinks
☆83Updated 9 months ago
Alternatives and similar repositories for odyssey-math
Users that are interested in odyssey-math are comparing it to the libraries listed below
Sorting:
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆142Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated last year
- Language models scale reliably with over-training and on downstream tasks☆100Updated last year
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆36Updated last year
- ☆55Updated 5 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆31Updated last year
- A library for efficient patching and automatic circuit discovery.☆78Updated 3 months ago
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆159Updated last year
- ☆195Updated 6 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆65Updated 8 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆111Updated this week
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆132Updated last year
- ☆53Updated last year
- Self-Alignment with Principle-Following Reward Models☆168Updated last month
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆150Updated last month
- PASTA: Post-hoc Attention Steering for LLMs☆125Updated 11 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆181Updated 6 months ago
- ☆103Updated last year
- ☆98Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆82Updated last year
- ☆128Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆117Updated 2 years ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆148Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆24Updated last year
- ☆165Updated last year
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆57Updated last year
- GenRM-CoT: Data release for verification rationales☆67Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆63Updated last year
- A unified benchmark for math reasoning☆88Updated 2 years ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆174Updated 5 months ago