A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆181Mar 6, 2026Updated 3 weeks ago
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bangla TTS Inference pipeline using Vit TTS☆13Mar 24, 2024Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆21Nov 17, 2025Updated 4 months ago
- First token cutoff sampling inference example☆30Jan 15, 2024Updated 2 years ago
- This is code for How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis☆16Nov 5, 2025Updated 4 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆317Jul 13, 2025Updated 8 months ago
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Awesome Bangla Datasets☆40Mar 29, 2025Updated last year
- ☆16Nov 23, 2023Updated 2 years ago
- Inspect: A framework for large language model evaluations☆1,851Updated this week
- ☆24Dec 12, 2024Updated last year
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Mar 19, 2026Updated last week
- ↔️ T5 Machine Translation from English to Korean☆18Aug 11, 2022Updated 3 years ago
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆171Dec 15, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated last year
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- The goal of this repository is to accelerate Azure OpenAI service adoption and put an enterprise governance structure around it using Azu…☆12Sep 13, 2023Updated 2 years ago
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- The GopherCon 2021 "Production AI with Go" workshop materials.☆13Dec 6, 2021Updated 4 years ago
- LLM query engine to retrieve augmented responses from json files.☆15Oct 12, 2023Updated 2 years ago
- SQLite integration via WASM☆12Dec 5, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- data visualizations and R code for #TidyTuesday 2021☆16Feb 4, 2022Updated 4 years ago
- Python client library for Cleanlab Trustworthy Language Model☆24Dec 9, 2025Updated 3 months ago
- Implementation of NAACL'25 "Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences"☆14Sep 9, 2025Updated 6 months ago
- An automated data pipeline scaling RL to pretraining levels☆74Oct 11, 2025Updated 5 months ago
- ☆12Feb 16, 2024Updated 2 years ago
- Evaluate your model using advanced prompt strategies☆21Jan 30, 2026Updated 2 months ago
- This is the official code for the EMNLP findings 2025 paper "Enhancing Time Awareness in Generative Recommendation".☆17Aug 30, 2025Updated 7 months ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago
- ☆14Mar 6, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).☆15Dec 13, 2024Updated last year
- ☆14Jun 25, 2024Updated last year
- Your local personalised AI agent☆43Nov 27, 2024Updated last year
- German dataset for DPR model training☆19Jul 21, 2024Updated last year
- This hands-on walks you through fine-tuning an open source LLM on Azure and serving the fine-tuned model on Azure. It is intended for Dat…☆59Mar 17, 2025Updated last year
- Select non rectangular ROI in video☆11Apr 18, 2018Updated 7 years ago
- [EMNLP 2024 Industry track] MERLIN : Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank P…☆14Mar 4, 2025Updated last year