facebookresearch / polymathLinks
AI Agent leveraging symbolic reasoning and other auxiliary tools to boost its capabilities on various logic and reasoning benchmarks. This project aims to develop a robust and flexible AI system that can tackle complex problems in areas such as decision-making, mathematics, and programming.
☆38Updated 3 months ago
Alternatives and similar repositories for polymath
Users that are interested in polymath are comparing it to the libraries listed below
Sorting:
- LeanUniverse: A Library for Consistent and Scalable Lean4 Dataset Management☆75Updated 10 months ago
- This is the official repository for all the code of TheoremLlama☆47Updated 4 months ago
- ☆312Updated 2 months ago
- ☆42Updated last year
- ☆75Updated last year
- Large language models designed for formal theorem proving through tool-integrated reasoning.☆30Updated 3 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- ☆72Updated last month
- UQ: Assessing Language Models on Unsolved Questions☆29Updated 3 months ago
- Fluid Language Model Benchmarking☆22Updated 2 months ago
- RLP: Reinforcement as a Pretraining Objective☆205Updated 2 months ago
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆38Updated 7 months ago
- ☆155Updated 2 weeks ago
- ☆34Updated last month
- ☆41Updated last year
- ☆218Updated 8 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆54Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆84Updated 8 months ago
- LIMI: Less is More for Agency☆151Updated last month
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆112Updated 2 months ago
- ☆27Updated 2 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆131Updated last year
- ☆38Updated 3 months ago
- Multi-Granularity LLM Debugger [ICSE2026]☆93Updated 5 months ago
- ☆35Updated 6 months ago
- ☆11Updated 7 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆225Updated last week
- AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆74Updated 2 weeks ago
- ☆73Updated 4 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year