davidhershey / ClaudePlaysPokemonStarterLinks
☆163Updated 6 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆307Updated 4 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆145Updated 7 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 11 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆325Updated 11 months ago
- ☆170Updated 3 months ago
- ☆93Updated 4 months ago
- Plotting (entropy, varentropy) for small LMs☆98Updated 4 months ago
- Worker to orchestrate and manage running an arbitrary number of LLM-generated builds concurrently using containerized Minecraft Servers.☆166Updated 10 months ago
- ☆124Updated 9 months ago
- SoTA Approach for ARC-AGI 2☆103Updated last month
- smol models are fun too☆93Updated 11 months ago
- LLM Chess - Large Language Models Competing in Chess☆67Updated 3 weeks ago
- A repo to evaluate various LLM's chess playing abilities.☆82Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 9 months ago
- Train your own SOTA deductive reasoning model☆108Updated 7 months ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆148Updated last month
- ☆167Updated 9 months ago
- ☆62Updated 3 months ago
- Testing baseline LLMs performance across various models☆316Updated last week
- A repository for training nanogpt-based Chess playing language models.☆26Updated last year
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆290Updated 2 months ago
- A benchmark for emotional intelligence in large language models☆368Updated last year
- ☆475Updated 3 months ago
- Draw more samples☆194Updated last year
- Benchmarking Agentic LLM and VLM Reasoning On Games☆201Updated 2 months ago
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆57Updated 10 months ago
- Prompts used in the Automated Auditing Blog Post☆107Updated 2 months ago
- ☆104Updated this week
- ☆177Updated 2 months ago
- Open source interpretability artefacts for R1.☆161Updated 5 months ago