PalisadeResearch / ctfishLinks
Chess agent specification gaming
☆20Updated last week
Alternatives and similar repositories for ctfish
Users that are interested in ctfish are comparing it to the libraries listed below
Sorting:
- Official repository of the 2025 paper, LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra.☆40Updated 2 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 5 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 5 months ago
- OpenPipe Reinforcement Learning Experiments☆31Updated 6 months ago
- ☆25Updated 3 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆44Updated last week
- ☆60Updated 2 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆64Updated 7 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆55Updated 6 months ago
- ☆39Updated last year
- Lego for GRPO☆29Updated 3 months ago
- Simple repository for training small reasoning models☆40Updated 7 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆110Updated 5 months ago
- Code for☆27Updated 9 months ago
- A Gymnasium-based Environment of the Abstraction and Reasoning Corpus (ARC)☆68Updated last year
- ☆54Updated 10 months ago
- Very minimal (and stateless) agent framework☆45Updated 8 months ago
- Collection of LLM completions for reasoning-gym task datasets☆29Updated 2 months ago
- Lottery Ticket Adaptation☆39Updated 10 months ago
- A Python reimplementation of "Planning with Large Language Models for Code Generation" (https://arxiv.org/abs/2303.05510)☆18Updated last year
- Clue inspired puzzles for testing LLM deduction abilities☆43Updated 6 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 5 months ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆74Updated 3 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆15Updated last year
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last week
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆61Updated 5 months ago
- ☆19Updated last month
- LLM reads a paper and produce a working prototype☆56Updated 5 months ago
- ☆27Updated last year