This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers.
☆20Nov 16, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Evals-Catalogue
Users that are interested in LLM-Evals-Catalogue are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simple repository showcasing a few LLM Evaluation strategies and leverages W&B Sweeps to optimize the LLM system.☆12Jul 11, 2023Updated 2 years ago
- Estimates fatigue loads in wind turbines from SCADA data based on supervised learning.☆10Sep 11, 2018Updated 7 years ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- Hubcap is an autonomous AI agent in 25 lines of code: a small Autobot that you can't trust. *This is the Python fork/port* from https://g…☆22Nov 10, 2025Updated 6 months ago
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆326Feb 5, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- I know Kung Fu☆25Mar 27, 2025Updated last year
- Physics-guided data-driven solutions for the wind energy industry☆28Jan 7, 2026Updated 4 months ago
- A Python tool for visualizing satellite positions using TLE (Two Line Element) data☆12May 1, 2022Updated 4 years ago
- Code and supplementary material complementing the WES-publication: "Change-point detection in wind turbine SCADA data for robust conditio…☆20Sep 2, 2021Updated 4 years ago
- Guidelines for the responsible use of explainable AI and machine learning.☆17Jan 30, 2023Updated 3 years ago
- ☆15Nov 25, 2025Updated 6 months ago
- Comparison of Metaflow, MLFlow and DVC☆14Aug 4, 2021Updated 4 years ago
- mySociety code common to several projects☆24Apr 16, 2026Updated last month
- [EMNLP 2022] Code for our paper “ZeroGen: Efficient Zero-shot Learning via Dataset Generation”.☆47Feb 18, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A place for ideas and drafts related to GAI risk management.☆22Jan 17, 2026Updated 4 months ago
- A web app to search pubmed☆12Jul 8, 2024Updated last year
- A proof-of-concept gaze detection Android app☆16Jan 10, 2017Updated 9 years ago
- Preprint/draft article/blog on some explainable machine learning misconceptions. WIP!☆29Jul 13, 2019Updated 6 years ago
- Example showing unstructured.io + timescaledb + PGAI☆18Nov 15, 2024Updated last year
- Using Machine Learning to Measure Job Skill Similarities - See more at: http://blog.nycdatascience.com/?p=11683&preview=true#sthash.NnPZZ…☆18Jun 20, 2016Updated 9 years ago
- A directory of practical and usable AI agents resources from applications and platforms to frameworks and utilities and other parts of th…☆35Apr 28, 2026Updated last month
- Resources for exploring Generative Feedback Loops with Weaviate!☆39Apr 22, 2025Updated last year
- WhatsApp chatbot with Dialogflow and Twilio api☆10May 6, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A collection of data science examples implemented across a variety of languages and libraries.☆34Jan 14, 2016Updated 10 years ago
- Churn analysis library☆22Dec 18, 2025Updated 5 months ago
- Code for paper "AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents"☆19Jun 12, 2023Updated 2 years ago
- [NeurIPS 2022] Explaining Graph Neural Networks with Structure-Aware Cooperative Games (GStarX)