A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆193Apr 30, 2026Updated last month
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- A project for implementing ML and NLP papers☆13May 22, 2020Updated 6 years ago
- First token cutoff sampling inference example☆30Jan 15, 2024Updated 2 years ago
- This is code for How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis☆18Nov 5, 2025Updated 7 months ago
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆316Jul 13, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- ☆17Nov 23, 2023Updated 2 years ago
- ☆79Jun 5, 2024Updated 2 years ago
- Inspect: A framework for large language model evaluations☆2,215Updated this week
- ☆34May 19, 2026Updated 3 weeks ago
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆172Dec 15, 2022Updated 3 years ago
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated 2 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Notebooks demonstrating example applications of the cleanvision library☆17Dec 16, 2025Updated 6 months ago
- Deploy automl models for tabular tasks on AWS Sagemaker with AutoGluon☆13Feb 28, 2020Updated 6 years ago
- My configuration files, loosely inspired by @sontek☆38May 28, 2026Updated 3 weeks ago
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated last year
- Universal LLM security auditor with automated jailbreak testing, DSPy optimization, and OWASP 2025-aligned attack patterns☆21Oct 23, 2025Updated 7 months ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆34Apr 2, 2025Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 di…