davidhershey / ClaudePlaysPokemonStarterLinks
☆157Updated 5 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆301Updated 3 months ago
- ☆159Updated 2 months ago
- ☆90Updated 2 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆321Updated 10 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 6 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 10 months ago
- Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words☆136Updated 2 weeks ago
- Benchmarking Agentic LLM and VLM Reasoning On Games☆188Updated 2 weeks ago
- Plotting (entropy, varentropy) for small LMs☆98Updated 3 months ago
- ☆163Updated 8 months ago
- Testing baseline LLMs performance across various models☆305Updated last month
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated 8 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆451Updated last month
- Train your own SOTA deductive reasoning model☆105Updated 6 months ago
- ☆56Updated last month
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆190Updated last year
- Worker to orchestrate and manage running an arbitrary number of LLM-generated builds concurrently using containerized Minecraft Servers.☆169Updated 9 months ago
- A benchmark for emotional intelligence in large language models☆351Updated last year
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆670Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 7 months ago
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last month
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆223Updated last month
- An AI-powered game playing agent using Claude and PyBoy☆32Updated 5 months ago
- LLM Chess - Large Language Models Competing in Chess☆63Updated this week
- ☆94Updated 2 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆257Updated this week
- ☆457Updated 3 months ago
- ☆201Updated 6 months ago
- OSS RL environment + evals toolkit☆159Updated this week
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆337Updated 2 months ago