arcprizeorg / model_baseline
Testing baseline LLMs performance across various models
☆222Updated 2 weeks ago
Alternatives and similar repositories for model_baseline:
Users that are interested in model_baseline are comparing it to the libraries listed below
- Draw more samples☆186Updated 7 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆411Updated 4 months ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆274Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆167Updated last month
- ☆96Updated 4 months ago
- ☆100Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆161Updated last week
- Fast parallel LLM inference for MLX☆163Updated 7 months ago
- An automated tool for discovering insights from research papaer corpora☆136Updated 8 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆289Updated 3 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆354Updated last month
- smol models are fun too☆88Updated 3 months ago
- GRadient-INformed MoE☆261Updated 4 months ago
- ☆142Updated 2 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 3 months ago
- ☆121Updated last week
- Code for ExploreTom☆75Updated 2 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆130Updated this week
- ☆78Updated 2 months ago
- ☆94Updated 4 months ago
- Sandboxed code execution for AI agents, locally or on the cloud.☆89Updated this week
- Long context evaluation for large language models☆200Updated last week
- WIP - Allows you to create DSPy pipelines using ComfyUI☆186Updated 2 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆287Updated 3 months ago