A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆180Mar 1, 2026Updated last week
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below
Sorting:
- This compendium reviews significant published research contributions and industrial engineering practices in leveraging Generative AI and…☆85Feb 22, 2026Updated 2 weeks ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- ☆13May 30, 2024Updated last year
- An example project demonstrating how to perform OCR with multi-modal LLMs☆10Mar 14, 2024Updated last year
- This is code for How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis☆14Nov 5, 2025Updated 4 months ago
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- GitHub Repository for Azure AI-102 Essentials to Learn, Implement, and Certify☆31Feb 11, 2026Updated 3 weeks ago
- Speeech Recognition for Indic languages.☆13Apr 3, 2021Updated 4 years ago
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆170Dec 15, 2022Updated 3 years ago
- 🌍 The open-source Wikipedia of AI — 2M+ apps, agents, LLMs & datasets. Updated daily with tools, tutorials & news.☆46Updated this week
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated last year
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆317Jul 13, 2025Updated 7 months ago
- QGEval: A Benchmark for Question Generation Evaluation☆19Nov 7, 2024Updated last year
- ↔️ T5 Machine Translation from English to Korean☆18Aug 11, 2022Updated 3 years ago
- ☆80Jun 5, 2024Updated last year
- Tutorials and related talks given by the aeon community☆19Oct 11, 2025Updated 4 months ago
- Evaluate your model using advanced prompt strategies☆21Jan 30, 2026Updated last month
- Deep Learning Paper Implementations in PyTorch☆18Mar 26, 2025Updated 11 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆28Apr 17, 2024Updated last year
- Language-agnostic BERT Sentence Embedding (LaBSE) Pytorch Model☆21Sep 2, 2020Updated 5 years ago
- ☆31Feb 18, 2026Updated 2 weeks ago
- German dataset for DPR model training☆19Jul 21, 2024Updated last year
- ShopME: An E2E fashion recommendation System☆20Jan 19, 2026Updated last month
- 🚀 Deep Learning GPU Selector☆22Jun 12, 2025Updated 8 months ago
- It's a cooler way to store simple linear models.☆27Jul 15, 2024Updated last year
- A minimal yet unstoppable blueprint for multi-agent AI—anchored by the rare, far-reaching “Multi-Agent AI DAO” (2017 Prior Art)—empowerin…☆32Jan 11, 2025Updated last year
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Sep 17, 2024Updated last year
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 8 months ago
- Knowledge pills on Neural Search☆27May 8, 2023Updated 2 years ago
- Three examples of recommendation system pipelines with NVIDIA Merlin and Redis☆72Apr 22, 2025Updated 10 months ago
- ☆29Apr 10, 2025Updated 10 months ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Feb 26, 2026Updated last week
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 di…☆38Sep 25, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 5 months ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆33Apr 2, 2025Updated 11 months ago
- Minimalistic text and vector search engines that use Scikit-Learn and Pandas☆41Feb 16, 2026Updated 2 weeks ago
- ☆33May 18, 2025Updated 9 months ago
- This project is an AI Recruitment System designed to accelerate the hiring process for HR and technical recruiters.☆14Jan 3, 2025Updated last year