davidhershey / ClaudePlaysPokemonStarterLinks
☆184Updated 9 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆326Updated 7 months ago
- ☆176Updated last month
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆512Updated last month
- ☆99Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆345Updated last year
- smolLM with Entropix sampler on pytorch☆149Updated last year
- ☆483Updated 6 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Updated last week
- ☆67Updated 6 months ago
- Testing baseline LLMs performance across various models☆335Updated this week
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆192Updated last year
- ☆86Updated 6 months ago
- Curated collection of community environments☆204Updated last week
- ☆312Updated last month
- Plotting (entropy, varentropy) for small LMs☆99Updated 7 months ago
- ☆189Updated last year
- LLM Chess - evaluating Large Language Models' reasoning and instruction-following abilities by simulating chess games☆85Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- explore token trajectory trees on instruct and base models☆150Updated 7 months ago
- smol models are fun too☆93Updated last year
- Super basic implementation (gist-like) of RLMs with REPL environments.☆435Updated last week
- SoTA Approach for ARC-AGI 2☆157Updated 4 months ago
- ☆133Updated last year
- Worker to orchestrate and manage running an arbitrary number of LLM-generated builds concurrently using containerized Minecraft Servers.☆167Updated last year
- The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆314Updated this week
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆296Updated last week
- Benchmarking Agentic LLM and VLM Reasoning On Games☆224Updated last month
- 🧬 The Huxley-Gödel Machine☆317Updated last month
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆330Updated 4 months ago
- Clue inspired puzzles for testing LLM deduction abilities☆44Updated 9 months ago