aidanmclaughlin / AidanBench
Aidan Bench attempts to measure <big_model_smell> in LLMs.
☆290Updated this week
Alternatives and similar repositories for AidanBench:
Users that are interested in AidanBench are comparing it to the libraries listed below
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆429Updated 6 months ago
- smol models are fun too☆92Updated 5 months ago
- Simple Transformer in Jax☆136Updated 9 months ago
- ☆97Updated 6 months ago
- ☆94Updated 6 months ago
- ☆107Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆151Updated 5 months ago
- A Loom implementation in Obsidian☆291Updated 3 weeks ago
- Fast parallel LLM inference for MLX☆179Updated 9 months ago
- ☆282Updated last week
- ShellSage saves sysadmins’ sanity by solving shell script snafus super swiftly☆312Updated 2 weeks ago
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆169Updated last week
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 11 months ago
- ☆55Updated last month
- Testing baseline LLMs performance across various models☆248Updated this week
- Long context evaluation for large language models☆206Updated last month
- ⚖️ Awesome LLM Judges ⚖️☆90Updated last month
- ComplexTensor: Machine Learning By Bridging Classical and Quantum Computation☆75Updated 4 months ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆218Updated this week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 5 months ago
- ☆112Updated 3 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated last month
- Draw more samples☆189Updated 9 months ago
- procedural reasoning datasets☆559Updated this week
- look how they massacred my boy☆63Updated 6 months ago
- ☆151Updated 4 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆102Updated this week
- Letting Claude Code develop his own MCP tools :)☆97Updated last month
- ☆117Updated 8 months ago
- Claude Deep Research config for Claude Code.☆165Updated last month