Every Eval Ever is a shared schema and crowdsourced eval database. It defines a standardized metadata format for storing AI evaluation results — from leaderboard scrapes and research papers to local evaluation runs — so that results from different frameworks can be compared, reproduced, and reused.
☆73Jun 1, 2026Updated last week
Alternatives and similar repositories for every_eval_ever
Users that are interested in every_eval_ever are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- James' cookbook of evaluations and finetuning experiments☆28Feb 19, 2026Updated 3 months ago
- ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.☆12Mar 23, 2023Updated 3 years ago
- Dataset for Unified Editing, EMNLP 2023. This is a model editing dataset where edits are natural language phrases.☆24Sep 4, 2024Updated last year
- An implementation of GrASP (Shnarch et. al., 2017)☆23Aug 29, 2022Updated 3 years ago
- Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.☆20Oct 3, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…☆20Jul 27, 2025Updated 10 months ago
- Repository for "Attribute First, then Generate: Locally-attributable Grounded Text Generation", ACL 2024☆30Dec 19, 2024Updated last year
- Official PyTorch implementation for ״ lassification-Regression for Chart Comprehension״☆26Feb 5, 2025Updated last year
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆13Oct 23, 2023Updated 2 years ago
- A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability☆32Jan 30, 2025Updated last year
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆33Mar 26, 2026Updated 2 months ago
- This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …☆147Feb 8, 2026Updated 4 months ago
- Auditing agents for fine-tuning safety☆21Oct 21, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year
- Fast wavelet transforms on the sphere☆13Dec 20, 2016Updated 9 years ago
- ☆15Jun 2, 2026Updated last week
- ☆15Jan 21, 2025Updated last year
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆14Feb 13, 2023Updated 3 years ago
- Automated terminal emulator benchmarks☆24Jun 2, 2026Updated last week
- ☆10Nov 1, 2022Updated 3 years ago
- Sparsify transformers with cross-layer transcoders☆25Nov 14, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Customizable charts made with TikZ and LaTeX3☆14Feb 11, 2023Updated 3 years ago
- Universe website☆10Mar 3, 2023Updated 3 years ago
- ☆13Jun 4, 2026Updated last week
- ☆14Jul 7, 2024Updated last year
- Flight Recorder allows to record client program execution and examine it later☆11Sep 18, 2020Updated 5 years ago
- ☆30Jul 2, 2025Updated 11 months ago
- A curated list of resources dedicated to NLP (paper, blogs, note and etc)☆13Nov 30, 2019Updated 6 years ago
- AgentIR is a retriever specialized for Deep Research agents.☆57Apr 16, 2026Updated last month
- ☆10Dec 4, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Transparent Reporting of Ethics for Generative AI (TREGAI) Checklist☆15Oct 16, 2024Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- An open source deep research clone. AI Agent (Local LLM or Gemini) that reasons large amounts of web data extracted with SwiftSoup.☆13Feb 10, 2025Updated last year
- [ACL 2021] Learning to Perturb Word Embeddings for Out-of-distribution QA☆16May 11, 2022Updated 4 years ago
- Code and Data for the ACL 2022 paper "Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling"☆11Apr 5, 2022Updated 4 years ago
- Legal Entity Name Understanding☆22Sep 25, 2025Updated 8 months ago
- Open Data Product Specification 3.0☆10Nov 28, 2024Updated last year