aidanmclaughlin / AidanBench
Aidan Bench attempts to measure <big_model_smell> in LLMs.
☆274Updated this week
Alternatives and similar repositories for AidanBench:
Users that are interested in AidanBench are comparing it to the libraries listed below
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆408Updated 4 months ago
- ☆100Updated last month
- smol models are fun too☆88Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 3 months ago
- Simple Transformer in Jax☆136Updated 7 months ago
- A Loom implementation in Obsidian☆280Updated 5 months ago
- Fast parallel LLM inference for MLX☆163Updated 7 months ago
- Draw more samples☆186Updated 7 months ago
- Long context evaluation for large language models☆200Updated last week
- ☆96Updated 4 months ago
- ☆94Updated 4 months ago
- ☆112Updated 6 months ago
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆245Updated 3 months ago
- A library for making RepE control vectors☆551Updated last month
- ☆265Updated 3 weeks ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Updated 9 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆129Updated this week
- Testing baseline LLMs performance across various models☆222Updated last week
- ☆396Updated 6 months ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆159Updated 2 weeks ago
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆164Updated this week
- ☆147Updated 2 months ago
- Extract full next-token probabilities via language model APIs☆228Updated 11 months ago
- ShellSage saves sysadmins’ sanity by solving shell script snafus super swiftly☆288Updated last week
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆222Updated this week
- look how they massacred my boy☆63Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 6 months ago