davidhershey / ClaudePlaysPokemonStarterLinks
☆136Updated 2 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 4 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 7 months ago
- Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words☆101Updated 2 weeks ago
- ⚖️ Awesome LLM Judges ⚖️☆105Updated 2 months ago
- explore token trajectory trees on instruct and base models☆127Updated last month
- ☆153Updated this week
- smol models are fun too☆93Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- ☆302Updated 2 months ago
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆54Updated 3 weeks ago
- ☆211Updated last week
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- ☆128Updated 6 months ago
- Scale your LLM-as-a-judge.☆240Updated 3 weeks ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆80Updated last month
- Open source interpretability artefacts for R1.☆149Updated 2 months ago
- Plotting (entropy, varentropy) for small LMs☆97Updated last month
- Clue inspired puzzles for testing LLM deduction abilities☆38Updated 3 months ago
- ☆156Updated 3 months ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆177Updated 2 weeks ago
- Benchmarking Agentic LLM and VLM Reasoning On Games☆156Updated last month
- Claude Deep Research config for Claude Code.☆187Updated 3 months ago
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last week
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆305Updated this week
- A repository for training nanogpt-based Chess playing language models.☆24Updated last year
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆514Updated this week
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆277Updated 2 weeks ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 7 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆184Updated this week
- ☆127Updated 3 months ago