Aleph-Alpha-Research/eval-framework

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Aleph-Alpha-Research/eval-framework)

Aleph-Alpha-Research / eval-framework

Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.

☆42

Alternatives and similar repositories for eval-framework

Users that are interested in eval-framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Aleph-Alpha-Research / scaling
View on GitHub
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆66Nov 18, 2025Updated 8 months ago
Aleph-Alpha / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆21Feb 1, 2026Updated 5 months ago
lhoestq / hfjobs
View on GitHub
Hugging Face Jobs
☆20Jul 11, 2025Updated last year
allenai / duplodocus
View on GitHub
Tooling for exact and MinHash deduplication of large-scale text datasets
☆90Mar 24, 2026Updated 3 months ago
JiaQiSJTU / VisionInText
View on GitHub
A benchmark on visual perception in text strings for both LLMs and MLLMs.
☆15Apr 7, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
baixianghuang / authorship-llm
View on GitHub
Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)
☆13Feb 4, 2025Updated last year
allenai / datamap-rs
View on GitHub
Data mapping framework for rust stuff
☆56Mar 25, 2026Updated 3 months ago
devrimcavusoglu / acl-bib-overleaf
View on GitHub
Split bib files for anthology bibliography for overleaf
☆11Aug 25, 2024Updated last year
DanAnastasyev / GramEval2020
View on GitHub
1st place solution for GramEval-2020
☆14Jan 13, 2023Updated 3 years ago
huggingface / Microsoft-Azure
View on GitHub
Hugging Face on Microsoft Azure (documentation, examples and more)
☆15Jul 10, 2026Updated last week
wswu / yawipa
View on GitHub
A comprehensive and extensible Wiktionary parsing framework.
☆25Sep 5, 2024Updated last year
sbera7 / Dialogue-act-classification
View on GitHub
Dialogue Act classification
☆18Jan 15, 2024Updated 2 years ago
catherinearnett / morphscore
View on GitHub
This is the repository for MorphScore, a tokenizer evaluation framework for morphological alignment.
☆17Jul 10, 2025Updated last year
allenai / olmes
View on GitHub
Reproducible, flexible LLM evaluations
☆388Mar 24, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
koaning / icepickle
View on GitHub
It's a cooler way to store simple linear models.
☆26Jul 15, 2024Updated 2 years ago
UtrechtUniversity / UU-dissertation-template
View on GitHub
This is a Utrecht University dissertation template for LaTeX
☆22Jul 31, 2025Updated 11 months ago
bigscience-workshop / metadata
View on GitHub
Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
☆29Jun 12, 2023Updated 3 years ago
allenai / decon
View on GitHub
decontamination
☆35Mar 4, 2026Updated 4 months ago
lizaku / vec2graph
View on GitHub
Mini-library for producing graph visualizations from embedding models
☆28Sep 10, 2020Updated 5 years ago
simonw / sqlite-fts5-trigram
View on GitHub
Trigram tokenizer module for SQLite FTS5
☆14Feb 22, 2021Updated 5 years ago
lucidrains / sdft-pytorch
View on GitHub
Explorations into the proposed SDFT, Self-Distillation Enables Continual Learning, from Shenfeld et al. of MIT
☆31Feb 6, 2026Updated 5 months ago
easonnie / ChaosNLI
View on GitHub
[EMNLP 2020] Collective HumAn OpinionS on Natural Language Inference Data
☆42Apr 7, 2022Updated 4 years ago
Aleph-Alpha / Alpha-MoE
View on GitHub
☆66Dec 10, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zouharvi / tokenization-scorer
View on GitHub
Simple-to-use scoring function for arbitrarily tokenized texts.
☆48Feb 19, 2025Updated last year
Immortalin / Simulacra
View on GitHub
Simple and Ideal Circuit Simulation
☆13Dec 4, 2017Updated 8 years ago
interstellarninja / function-calling-eval
View on GitHub
A framework for evaluating function calls made by LLMs
☆41Jul 23, 2024Updated last year
HKUST-KnowComp / Visual_PCR
View on GitHub
Dataset and Source code for EMNLP 2019 paper "What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues"
☆26Sep 10, 2021Updated 4 years ago
shubhamagarwal92 / visdial_conv
View on GitHub
This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?
☆33Mar 24, 2023Updated 3 years ago
davidberenstein1957 / dataset-viber
View on GitHub
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
☆47Sep 5, 2024Updated last year
nitin966 / OpenEncompass
View on GitHub
A flexible framework for AI agents that separates workflow logic from search strategy. Based on https://arxiv.org/pdf/2512.03571 Also rea…
☆24Feb 10, 2026Updated 5 months ago
apple / visatronic-demo
View on GitHub
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
☆15May 28, 2025Updated last year
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Emperor-WS / PyEmber
View on GitHub
An Educational Framework Based on PyTorch for Deep Learning Education and Exploration
☆11Dec 24, 2023Updated 2 years ago
oceanumeric / EnteRAG
View on GitHub
A RAG that can scale 🧑🏻‍💻
☆11May 28, 2024Updated 2 years ago
martinchristen / pyconde2023
View on GitHub
Notebooks for the tutorial of the PyCon.DE 2023 Conference
☆14Apr 18, 2023Updated 3 years ago
vincentamato / mlx-coconut
View on GitHub
An MLX port of Meta's Coconut reasoning model
☆16Sep 2, 2025Updated 10 months ago
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
caiqizh / LUQ
View on GitHub
☆14Jan 14, 2026Updated 6 months ago
ProofAgent-ai / proofagent-harness
View on GitHub
Open-source test harness for AI agents. Stress-test production agents with adversarial multi-turn scenarios in CI
☆16Jul 13, 2026Updated last week