ianarawjo/evalstats

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ianarawjo/evalstats)

ianarawjo / evalstats

Statistical analysis methods for comparing prompt and model performance in LLM evaluations.

☆108

Alternatives and similar repositories for evalstats

Users that are interested in evalstats are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FlashSampling / FlashSampling
View on GitHub
FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)
☆76Jun 15, 2026Updated last month
collaborative-deep-research / agent-papers-cli
View on GitHub
☆46Mar 22, 2026Updated 3 months ago
sneha-rk / data-recipes
View on GitHub
☆37May 4, 2026Updated 2 months ago
0xD4rky / nanotok
View on GitHub
☆27Jun 7, 2026Updated last month
mschnetzer / scrollytell_arbeitszeit
View on GitHub
☆30Nov 8, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Infatoshi / physics-llm-inference
View on GitHub
Companion code for The Physics of LLM Inference book
☆26Apr 21, 2026Updated 3 months ago
tengxiao1 / MR-Search
View on GitHub
Meta-Reinforcement Learning with Self-Reflection
☆33Mar 26, 2026Updated 3 months ago
uwdata / divi
View on GitHub
Automatically interact with SVG charts.
☆20Sep 23, 2025Updated 9 months ago
AnthonyRonning / pi-ax-model-optimization
View on GitHub
☆36Apr 25, 2026Updated 2 months ago
doomslide / autoloom
View on GitHub
Approximating the joint distribution of language models via MCTS
☆22Nov 3, 2024Updated last year
strangeloopcanon / tevo
View on GitHub
TEVO: evolve LM motifs cheaply, then validate them in downstream train.py loops.
☆19Apr 18, 2026Updated 3 months ago
benjamin-kohler / social_science_replicability
View on GitHub
☆26Apr 26, 2026Updated 2 months ago
allenai / infinigram-api
View on GitHub
☆102Updated this week
UPB-SS1 / PyCrowdTangle
View on GitHub
A Python Wrapper To Retrieve Data From The CrowdTangle API
☆11Mar 26, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alexisfox7 / RGB-Agent
View on GitHub
☆127Jun 1, 2026Updated last month
hamelsmu / research-council
View on GitHub
☆102Feb 27, 2026Updated 4 months ago
allenai / olmo-cookbook
View on GitHub
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆72May 29, 2026Updated last month
MaximeRivest / dspy-lm-auth
View on GitHub
☆32Mar 11, 2026Updated 4 months ago
juzhengz / logit-fusion
View on GitHub
Learning from Mixed Rollouts: Logit Fusion as a Bridge Between Imitation and Exploration
☆17Feb 24, 2026Updated 4 months ago
ForBo7 / fastai-close-reading
View on GitHub
Structured close reading (or rather, close watching) transcripts of _almost _ every lesson in Jeremy Howard's Practical Deep Learning for…
☆34Mar 9, 2026Updated 4 months ago
dicej / spin-teavm-example
View on GitHub
Example of Spin app written in Java using TeaVM-WASI and wit-bindgen
☆12Dec 7, 2022Updated 3 years ago
halfprice06 / rlmgrep
View on GitHub
☆66Feb 14, 2026Updated 5 months ago
modaic-ai / microcode
View on GitHub
context-efficient terminal agent powered by an RLM
☆60Feb 7, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
google-deepmind / simply
View on GitHub
Minimal and scalable research codebase in JAX, designed for rapid iteration on frontier research in LLM and other autoregressive models.
☆548Jun 19, 2026Updated last month
lambda-calculus-LLM / lambda-RLM
View on GitHub
Method for Long Context RLMs using verifiable Lambda Calculus
☆304Apr 24, 2026Updated 2 months ago
HarmanDotpy / pairwise-self-verification
View on GitHub
[ICML 2026] Code for V1: Unifying Generation and Self-Verification for Parallel Reasoners.
☆39Mar 5, 2026Updated 4 months ago
hamelsmu / hamel
View on GitHub
General Utilities
☆58Jun 21, 2026Updated last month
Pakillo / writing-reproducible-manuscripts
View on GitHub
Talk on writing reproducible manuscripts
☆15Oct 8, 2020Updated 5 years ago
dataflowr / llm_efficiency
View on GitHub
KV Cache & LoRA for minGPT
☆61Mar 4, 2026Updated 4 months ago
arcee-ai / pybubble
View on GitHub
☆81Feb 18, 2026Updated 5 months ago
guestrin-lab / deepscholar
View on GitHub
build and benchmark deep research
☆245Mar 28, 2026Updated 3 months ago
openprose / press
View on GitHub
☆49Mar 31, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
shauli-ravfogel / descriptions
View on GitHub
☆10May 11, 2024Updated 2 years ago
yrvelez / qualtrics-mcp-server
View on GitHub
☆29Jul 14, 2026Updated last week
Xodarap / tiktok-odds
View on GitHub
☆11Sep 15, 2020Updated 5 years ago
FlorinAndrei / misc
View on GitHub
a catch-all repo
☆11Dec 28, 2023Updated 2 years ago
bal2ag / cachual
View on GitHub
Cache the return values of your Python functions with a simple decorator.
☆11Jan 17, 2017Updated 9 years ago
allenai / asta-paper-finder
View on GitHub
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
☆245Mar 17, 2026Updated 4 months ago
tslocz / hettreatreg
View on GitHub
OLS Weights on Heterogeneous Treatment Effects
☆10Jun 15, 2020Updated 6 years ago