oashua / MathAgent
Code repo for MathAgent
☆13Updated last year
Alternatives and similar repositories for MathAgent:
Users that are interested in MathAgent are comparing it to the libraries listed below
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆37Updated last month
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated 10 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆32Updated 5 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆30Updated 3 months ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆19Updated 10 months ago
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆18Updated last month
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆31Updated 11 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆25Updated last month
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆51Updated 9 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 7 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆57Updated 6 months ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆34Updated 11 months ago
- ☆46Updated 6 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆24Updated 10 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆57Updated this week
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆27Updated 5 months ago
- Evaluate the Quality of Critique☆35Updated 7 months ago
- ☆23Updated 4 months ago
- ☆11Updated last year
- Repository for Skill Set Optimization☆12Updated 5 months ago
- This is the official repository for all the code of TheoremLlama☆34Updated 3 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- ☆20Updated 7 months ago
- ☆33Updated 9 months ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆36Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 3 weeks ago