evaleval/every_eval_ever

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/evaleval/every_eval_ever)

evaleval / every_eval_ever

Every Eval Ever is a shared schema and crowdsourced eval database. It defines a standardized metadata format for storing AI evaluation results — from leaderboard scrapes and research papers to local evaluation runs — so that results from different frameworks can be compared, reproduced, and reused.

☆66

Alternatives and similar repositories for every_eval_ever

Users that are interested in every_eval_ever are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thejaminator / latteries
View on GitHub
James' cookbook of evaluations and finetuning experiments
☆27Feb 19, 2026Updated 3 months ago
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
jonkahana / CLIPPR
View on GitHub
An official PyTorch implementation for CLIPPR
☆31Jul 22, 2023Updated 2 years ago
apoorvkh / academic-pretraining
View on GitHub
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆152Oct 2, 2025Updated 7 months ago
BatsResearch / LexC-Gen
View on GitHub
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
☆20Oct 3, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
BatsResearch / crosslingual-test-time-scaling
View on GitHub
Crosslingual Reasoning through Test-Time Scaling
☆19May 13, 2025Updated last year
aws-samples / amazon-sagemaker-container-with-fastai
View on GitHub
Build a Docker container to build, train and deploy fast.ai based Deep Learning models with Amazon SageMaker
☆13Dec 15, 2018Updated 7 years ago
siyan-zhao / ICL_decision_boundary
View on GitHub
official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…
☆20Jul 27, 2025Updated 9 months ago
lovodkin93 / attribute-first-then-generate
View on GitHub
Repository for "Attribute First, then Generate: Locally-attributable Grounded Text Generation", ACL 2024
☆30Dec 19, 2024Updated last year
songys / 2021Langcon
View on GitHub
☆11Oct 3, 2021Updated 4 years ago
BatsResearch / nayak-aclfindings24-code
View on GitHub
☆22Jul 16, 2024Updated last year
ko-nlp / moducorpus-sanitizer
View on GitHub
모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.
☆11Mar 2, 2022Updated 4 years ago
koayon / awesome-sparse-autoencoders
View on GitHub
A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability
☆32Jan 30, 2025Updated last year
LARK-AI-Lab / CodeScaler
View on GitHub
The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"
☆33Mar 26, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆145Feb 8, 2026Updated 3 months ago
isle-dev / MetricEval
View on GitHub
MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…
☆12Nov 6, 2023Updated 2 years ago
safety-research / finetuning-auditor
View on GitHub
Auditing agents for fine-tuning safety
☆21Oct 21, 2025Updated 7 months ago
UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
ezyang / ai-blindspots
View on GitHub
Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.
☆13Mar 20, 2025Updated last year
salesforce / dialog-flow-extraction
View on GitHub
☆15Mar 3, 2022Updated 4 years ago
anthropics / rogue-deploy-eval
View on GitHub
☆14Jan 21, 2025Updated last year
alacritty / termbenchbot
View on GitHub
Automated terminal emulator benchmarks
☆23May 1, 2026Updated 2 weeks ago
telepathylabsai / BETOLD_dataset
View on GitHub
☆10Nov 1, 2022Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
EleutherAI / clt-training
View on GitHub
Sparsify transformers with cross-layer transcoders
☆23Nov 14, 2025Updated 6 months ago
cceyda / image-checker
View on GitHub
Fast Image Integrity Checker: Scan for corrupted images using Nvidia DALI
☆22Jun 20, 2021Updated 4 years ago
telepathylabsai / dialog_breakdown_detection
View on GitHub
☆10Nov 8, 2022Updated 3 years ago
eburghar / l3charts
View on GitHub
Customizable charts made with TikZ and LaTeX3
☆14Feb 11, 2023Updated 3 years ago
universelabs / universe.engineering
View on GitHub
Universe website
☆10Mar 3, 2023Updated 3 years ago
johanhelsing / bevy_touch_stick
View on GitHub
An analog touch screen joystick that pretends to be a bevy gamepad
☆13Jul 13, 2024Updated last year
sandbox-social / mastodon-sim
View on GitHub
Generative Agent simulation of a Mastodon social network
☆26May 7, 2026Updated 2 weeks ago
DSBA-Lab / Contrastive-Accumulation
View on GitHub
☆14Jul 7, 2024Updated last year
smartyfh / DST-ASSIST
View on GitHub
ASSIST: Towards Label Noise-Robust Dialogue State Tracking
☆10Apr 11, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
huashen218 / bidirectional-alignment-reading-list
View on GitHub
ICLR 2025 Workshop & CHI 2025 SIG: "Bidirectional Human-AI Alignment"
☆54Aug 6, 2024Updated last year
mgonzalezbaile / rag-incident-cve-analysis
View on GitHub
☆30Jul 2, 2025Updated 10 months ago
Valsure / MemFactory
View on GitHub
☆54Apr 7, 2026Updated last month
texttron / AgentIR
View on GitHub
AgentIR is a retriever specialized for Deep Research agents.
☆56Apr 16, 2026Updated last month
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated last year
naver-ai / cs-shortcut
View on GitHub
Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)
☆18Nov 24, 2022Updated 3 years ago
clarifying-EM / model-organisms-for-EM
View on GitHub
Code repo for the model organisms and convergent directions of EM papers.
☆64Sep 22, 2025Updated 7 months ago