davidhershey / ClaudePlaysPokemonStarterLinks
☆138Updated 3 months ago
Alternatives and similar repositories for ClaudePlaysPokemonStarter
Users that are interested in ClaudePlaysPokemonStarter are comparing it to the libraries listed below
Sorting:
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 4 months ago
- ☆154Updated 2 weeks ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆316Updated 8 months ago
- ☆88Updated last month
- smolLM with Entropix sampler on pytorch☆150Updated 8 months ago
- Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words☆130Updated this week
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆60Updated 7 months ago
- ☆132Updated 6 months ago
- Benchmark environment for evaluating vision-language models (VLMs) on popular video games!☆285Updated last month
- ☆308Updated 3 months ago
- ⚖️ Awesome LLM Judges ⚖️☆107Updated 2 months ago
- A repository for training nanogpt-based Chess playing language models.☆24Updated last year
- Train your own SOTA deductive reasoning model☆99Updated 4 months ago
- explore token trajectory trees on instruct and base models☆134Updated last month
- Plotting (entropy, varentropy) for small LMs☆97Updated last month
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆81Updated 2 months ago
- Testing baseline LLMs performance across various models☆284Updated last week
- Worker to orchestrate and manage running an arbitrary number of LLM-generated builds concurrently using containerized Minecraft Servers.☆169Updated 7 months ago
- Clue inspired puzzles for testing LLM deduction abilities☆38Updated 3 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 6 months ago
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆56Updated last week
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated 3 weeks ago
- A benchmark for emotional intelligence in large language models☆323Updated 11 months ago
- ☆72Updated last month
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆282Updated this week
- Inference-time scaling for LLMs-as-a-judge.☆251Updated this week
- ☆130Updated 2 months ago
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆177Updated last week
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆183Updated last week
- Fast parallel LLM inference for MLX☆198Updated last year