protagolabs / odyssey-math
☆83Updated last month
Alternatives and similar repositories for odyssey-math:
Users that are interested in odyssey-math are comparing it to the libraries listed below
- ☆94Updated last year
- ☆34Updated 11 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆79Updated 7 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- A framework for few-shot evaluation of autoregressive language models.☆24Updated last year
- ☆93Updated last year
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- Self-Alignment with Principle-Following Reward Models☆156Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆90Updated 3 weeks ago
- ☆122Updated 4 months ago
- ☆47Updated 7 months ago
- A library for efficient patching and automatic circuit discovery.☆59Updated last month
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆54Updated 3 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 11 months ago
- ☆48Updated 11 months ago
- ☆156Updated 2 weeks ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆43Updated last year
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆138Updated 5 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆97Updated 3 months ago
- GenRM-CoT: Data release for verification rationales☆51Updated 5 months ago
- ☆39Updated 7 months ago
- ☆38Updated 4 months ago
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆157Updated 11 months ago
- ☆115Updated 8 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆149Updated this week
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆30Updated 10 months ago
- ☆95Updated 8 months ago