PalisadeResearch / ctfishLinks
Chess agent specification gaming
☆23Updated this week
Alternatives and similar repositories for ctfish
Users that are interested in ctfish are comparing it to the libraries listed below
Sorting:
- Clue inspired puzzles for testing LLM deduction abilities☆45Updated 10 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Updated 9 months ago
- Official repository of the 2025 paper, LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra.☆63Updated last week
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated last week
- Lego for GRPO☆30Updated 8 months ago
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated last year
- OpenPipe Reinforcement Learning Experiments☆32Updated 10 months ago
- Verification of Google DeepMind's AlphaEvolve 48-multiplication matrix algorithm, a breakthrough in matrix multiplication after 56 years.☆131Updated 7 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆65Updated 11 months ago
- Interactive Textbook Demo☆52Updated 3 months ago
- ☆134Updated 4 months ago
- A novel approach for transformer model introspection that enables saving, compressing, and manipulating internal thought states for advan…☆29Updated 10 months ago
- RAG Agent for the ARC AGI Challenge☆20Updated last year
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated 4 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- ☆55Updated last year
- ☆25Updated 8 months ago
- ☆28Updated 9 months ago
- ☆105Updated 6 months ago
- Simple GRPO scripts and configurations.☆59Updated 11 months ago
- LLM reads a paper and produce a working prototype☆60Updated 9 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- ☆95Updated last week
- ☆27Updated last year
- ☆40Updated last year
- ☆106Updated 7 months ago
- ☆67Updated 6 months ago
- Universal Reasoning Model☆121Updated 2 weeks ago
- Public repository containing METR's DVC pipeline for eval data analysis☆189Updated this week
- ☆39Updated last year