Testing baseline LLMs performance across various models
☆338Feb 10, 2026Updated 3 weeks ago
Alternatives and similar repositories for arc-agi-benchmarking
Users that are interested in arc-agi-benchmarking are comparing it to the libraries listed below
Sorting:
- ☆655May 22, 2025Updated 9 months ago
- ☆15Jun 19, 2025Updated 8 months ago
- My submission to the ARC-AGI-3 Developer Preview Agent Compitition.☆42Jan 27, 2026Updated last month
- ☆158Feb 20, 2026Updated 2 weeks ago
- Evaluating majors LLMs on the Abstraction and Reasoning Corpus☆17Nov 9, 2023Updated 2 years ago
- The Abstraction and Reasoning Corpus☆4,724Apr 4, 2025Updated 11 months ago
- Video Diffusion Model. Autoregressive, long context, efficient training and inference. WIP☆35Feb 17, 2026Updated 2 weeks ago
- Bootstrapping ARC☆155Nov 20, 2024Updated last year
- ☆30Aug 7, 2025Updated 7 months ago
- ☆19Jul 31, 2025Updated 7 months ago
- Like ARC, but code to generate visual puzzles. 1D puzzles first.☆22Aug 17, 2024Updated last year
- Draw more samples☆198Jun 23, 2024Updated last year
- ☆27Aug 16, 2025Updated 6 months ago
- Domain Specific Language for the Abstraction and Reasoning Corpus☆321Oct 11, 2024Updated last year
- Reverse Engineering the Abstraction and Reasoning Corpus☆333Feb 24, 2025Updated last year
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆20Sep 27, 2024Updated last year
- ☆38Feb 25, 2024Updated 2 years ago
- ☆485Jul 18, 2025Updated 7 months ago
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆318Jun 26, 2025Updated 8 months ago
- A Gymnasium-based Environment of the Abstraction and Reasoning Corpus (ARC)☆69Aug 30, 2024Updated last year
- Implementation of SOAR☆51Sep 17, 2025Updated 5 months ago
- An Open Source SLM Trained for MCP☆23May 18, 2025Updated 9 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆343Nov 10, 2025Updated 3 months ago
- Abstract Reasoning with Graph Abstractions (ARGA) implementation☆61Jul 5, 2024Updated last year
- ☆100Feb 24, 2026Updated last week
- ☆24Feb 18, 2026Updated 2 weeks ago
- Framework enabling modular interchange of language agents, environments, and optimizers☆124Mar 2, 2026Updated last week
- A GPT with self-similar nested properties☆20Mar 19, 2024Updated last year
- ☆23Apr 4, 2024Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆108Nov 25, 2025Updated 3 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆865Dec 29, 2025Updated 2 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- my solution for Abstaction and reasoning challenge on kaggle☆10Jun 23, 2024Updated last year
- Run GEPA on your favorite non-python libraries.☆33Jan 22, 2026Updated last month
- Some basic tools for interacting with `tcf-agent`☆11Jan 19, 2024Updated 2 years ago
- Stuff related to scraping the Code Review StackExchange☆12Jan 19, 2023Updated 3 years ago
- A model-based API Fuzzer for SMT Solvers.☆15Oct 14, 2025Updated 4 months ago
- A Pytorch implementation of "Measuring abstract reasoning in neural networks" in ICML 2018 by DeepMind☆37Jul 8, 2023Updated 2 years ago