A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆188Apr 30, 2026Updated last week
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- First token cutoff sampling inference example☆30Jan 15, 2024Updated 2 years ago
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆317Jul 13, 2025Updated 9 months ago
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Awesome Bangla Datasets☆43Mar 29, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆16Nov 23, 2023Updated 2 years ago
- ☆80Jun 5, 2024Updated last year
- Inspect: A framework for large language model evaluations☆2,000Updated this week
- ↔️ T5 Machine Translation from English to Korean☆18Aug 11, 2022Updated 3 years ago
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated 2 years ago
- Deploy automl models for tabular tasks on AWS Sagemaker with AutoGluon☆13Feb 28, 2020Updated 6 years ago
- A tool that can be used to measure the sequential performance of any OpenAI-compatible LLM API☆24Aug 1, 2024Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- Python library for Evaluation☆17Mar 31, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 10 months ago
- LLM query engine to retrieve augmented responses from json files.☆15Oct 12, 2023Updated 2 years ago
- Cookbooks showcasing various applications of Cleanlab☆22Jan 20, 2026Updated 3 months ago
- Python client library for Cleanlab Trustworthy Language Model☆24Dec 9, 2025Updated 5 months ago
- Implementation of NAACL'25 "Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences"☆14Sep 9, 2025Updated 8 months ago
- ☆11Nov 12, 2020Updated 5 years ago
- Evaluate your model using advanced prompt strategies☆21Jan 30, 2026Updated 3 months ago
- This is the official code for the EMNLP findings 2025 paper "Enhancing Time Awareness in Generative Recommendation".☆17Aug 30, 2025Updated 8 months ago
- Text Summarization in Pytorch The aim of this project is to build a text summarizer that summarize Amazon Reviews.☆10Jun 7, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- German dataset for DPR model training☆19Jul 21, 2024Updated last year
- This hands-on walks you through fine-tuning an open source LLM on Azure and serving the fine-tuned model on Azure. It is intended for Dat…☆60Mar 17, 2025Updated last year
- Code for "Approaching Deep Learning through the Spectral Dynamics of Weights"☆13Oct 30, 2024Updated last year
- ICDE 2025 Paper, Grounding Natural Language to SQL Translation with Data-Based Self-Explanations☆17May 24, 2025Updated 11 months ago
- ☆31Nov 8, 2023Updated 2 years ago
- The official repository for MGFiD (NAACL 2024 Findings)☆15Jul 27, 2024Updated last year
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,100Dec 3, 2025Updated 5 months ago
- An R library for estimating causal effects☆12Apr 25, 2025Updated last year
- ☆17Feb 8, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 3-Pipeline LLMOps Financial advisor. Steaming pipeline deployed on AWS, 24/7 collects, embeds live-data into QdrantDB. Training pipeline …☆26Apr 12, 2025Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- Create embeddings for LLM using the Nomic API☆23Nov 21, 2024Updated last year
- This is a code package is related to the follow scientific article: Andrea Pizzo, Daniel Verenzuela, Luca Sanguinetti and Emil Björnson,…☆14May 18, 2018Updated 7 years ago
- A minimal yet unstoppable blueprint for multi-agent AI—anchored by the rare, far-reaching “Multi-Agent AI DAO” (2017 Prior Art)—empowerin…☆36Jan 11, 2025Updated last year
- A library to create lore plots (logistic regression of the prevalence of a categorical variable in function of a continuous feature)☆18May 1, 2026Updated last week
- awesome synthetic (text) datasets☆330Jan 8, 2026Updated 4 months ago