simple-bench / SimpleBench
☆70Updated last month
Alternatives and similar repositories for SimpleBench:
Users that are interested in SimpleBench are comparing it to the libraries listed below
- ☆112Updated 5 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆396Updated 4 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Updated 9 months ago
- Draw more samples☆185Updated 7 months ago
- ☆151Updated 6 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆120Updated this week
- ☆48Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆69Updated 3 months ago
- ☆109Updated last month
- This is our own implementation of 'Layer Selective Rank Reduction'☆232Updated 8 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆64Updated last month
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Updated 2 months ago
- Approximation of the Claude 3 tokenizer by inspecting generation stream☆120Updated 6 months ago
- Keeping my personal experiments separate from the main repo☆64Updated 4 months ago
- A continuously learning web-browsing AI agent that generalizes the Voyager architecture.☆37Updated last year
- A simple Python sandbox for helpful LLM data agents☆216Updated 7 months ago
- ☆74Updated last year
- smol models are fun too☆87Updated 2 months ago
- Just a bunch of benchmark logs for different LLMs☆117Updated 6 months ago
- smolLM with Entropix sampler on pytorch☆148Updated 3 months ago
- ☆96Updated 3 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆158Updated 2 weeks ago
- Automates the process of prompt engineering using Anthropic's Claude language model.☆65Updated 10 months ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆151Updated last week
- A benchmark for emotional intelligence in large language models☆216Updated 6 months ago
- Routing on Random Forest (RoRF)☆100Updated 4 months ago
- This repository explains and provides examples for "concept anchoring" in GPT4.☆72Updated last year
- The history files when recording human interaction while solving ARC tasks☆96Updated last week
- Function Calling Benchmark & Testing☆79Updated 6 months ago
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆142Updated 4 months ago