facebookresearch / polymathLinks
AI Agent leveraging symbolic reasoning and other auxiliary tools to boost its capabilities on various logic and reasoning benchmarks. This project aims to develop a robust and flexible AI system that can tackle complex problems in areas such as decision-making, mathematics, and programming.
☆38Updated 5 months ago
Alternatives and similar repositories for polymath
Users that are interested in polymath are comparing it to the libraries listed below
Sorting:
- ☆42Updated last year
- LeanUniverse: A Library for Consistent and Scalable Lean4 Dataset Management☆75Updated 11 months ago
- This is the official repository for all the code of TheoremLlama☆47Updated 5 months ago
- ☆40Updated 3 weeks ago
- Large language models designed for formal theorem proving through tool-integrated reasoning.☆31Updated 5 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Updated 3 months ago
- ☆79Updated last year
- ☆395Updated 3 weeks ago
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆39Updated 8 months ago
- ☆74Updated this week
- ☆41Updated last year
- ☆45Updated 6 months ago
- ☆21Updated 5 months ago
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆68Updated 6 months ago
- Our solution to Putnam 2025.☆36Updated this week
- Official Repository of Native Parallel Reasoner☆92Updated 3 weeks ago
- ☆178Updated last month
- UQ: Assessing Language Models on Unsolved Questions☆29Updated 4 months ago
- ☆224Updated 9 months ago
- Multi-Granularity LLM Debugger [ICSE2026]☆94Updated 6 months ago
- For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…☆11Updated 7 months ago
- LLMs + Lean, on your laptop or in the cloud☆199Updated 3 months ago
- General benchmarking apparatus for running multi-agent systems against benchmarks☆39Updated 3 weeks ago
- ☆29Updated 2 months ago
- ☆47Updated 5 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆125Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆87Updated 9 months ago
- Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization☆38Updated last month
- Monadic Context Engineering☆16Updated this week
- ☆79Updated 2 months ago