davidhershey / ClaudePlaysPokemonStarterLinks
☆159Updated 5 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆304Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 10 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 7 months ago
- ☆166Updated 9 months ago
- ☆165Updated 2 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆321Updated 11 months ago
- Plotting (entropy, varentropy) for small LMs☆99Updated 4 months ago
- Draw more samples☆193Updated last year
- Worker to orchestrate and manage running an arbitrary number of LLM-generated builds concurrently using containerized Minecraft Servers.☆167Updated 9 months ago
- Open source interpretability artefacts for R1.☆159Updated 5 months ago
- SoTA Approach for ARC-AGI 2☆86Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- Train your own SOTA deductive reasoning model☆106Updated 6 months ago
- A repo to evaluate various LLM's chess playing abilities.☆83Updated last year
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆190Updated last year
- smol models are fun too☆93Updated 10 months ago
- ☆93Updated 3 months ago
- ☆60Updated 2 months ago
- ☆85Updated 2 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆450Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆312Updated 3 months ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆227Updated last month
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆697Updated this week
- Training-Ready RL Environments + Evals☆111Updated this week
- ☆316Updated 2 months ago
- A benchmark for emotional intelligence in large language models☆361Updated last year
- ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution☆261Updated this week
- Async RL Training at Scale☆650Updated this week
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆83Updated last month
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆472Updated last month