A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
☆185Mar 6, 2026Updated last month
Alternatives and similar repositories for LLMEvaluation
Users that are interested in LLMEvaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bangla TTS Inference pipeline using Vit TTS☆12Mar 24, 2024Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- First token cutoff sampling inference example☆30Jan 15, 2024Updated 2 years ago
- Chance-corrected Agreement Coefficients☆29Nov 13, 2024Updated last year
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆318Jul 13, 2025Updated 9 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Inspect: A framework for large language model evaluations☆1,904Updated this week
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- ↔️ T5 Machine Translation from English to Korean☆18Aug 11, 2022Updated 3 years ago
- Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.☆46Feb 9, 2026Updated 2 months ago
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆172Dec 15, 2022Updated 3 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- GitHub Repository for Azure AI-102 Essentials to Learn, Implement, and Certify☆33Feb 11, 2026Updated 2 months ago
- A tool that can be used to measure the sequential performance of any OpenAI-compatible LLM API☆24Aug 1, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆34Apr 2, 2025Updated last year
- Universal LLM security auditor with automated jailbreak testing, DSPy optimization, and OWASP 2025-aligned attack patterns☆21Oct 23, 2025Updated 5 months ago
- Speeech Recognition for Indic languages.☆13Apr 3, 2021Updated 5 years ago
- SQLite integration via WASM☆12Dec 5, 2023Updated 2 years ago
- Designing a Dashboard for Transparency and Control of Conversational AI, https://arxiv.org/abs/2406.07882☆37Oct 7, 2025Updated 6 months ago
- ☆11Nov 12, 2020Updated 5 years ago
- ☆160Sep 12, 2023Updated 2 years ago
- ☆12Feb 16, 2024Updated 2 years ago
- An automated data pipeline scaling RL to pretraining levels☆75Oct 11, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A cookiecutter template for Python agent projects that use uv for dependency management☆30Mar 13, 2026Updated last month
- ☆24May 24, 2025Updated 10 months ago
- Evaluate your model using advanced prompt strategies☆21Jan 30, 2026Updated 2 months ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago
- ☆17Jul 23, 2025Updated 8 months ago
- Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).☆15Dec 13, 2024Updated last year
- Your local personalised AI agent☆43Nov 27, 2024Updated last year
- German dataset for DPR model training☆19Jul 21, 2024Updated last year
- Code for "Approaching Deep Learning through the Spectral Dynamics of Weights"☆13Oct 30, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆31Nov 8, 2023Updated 2 years ago
- A minimal yet unstoppable blueprint for multi-agent AI—anchored by the rare, far-reaching “Multi-Agent AI DAO” (2017 Prior Art)—empowerin…☆32Jan 11, 2025Updated last year
- The official repository for MGFiD (NAACL 2024 Findings)☆15Jul 27, 2024Updated last year
- Wrap-up around RinteRface templates☆11Apr 10, 2019Updated 7 years ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,090Dec 3, 2025Updated 4 months ago
- An R library for estimating causal effects☆12Apr 25, 2025Updated 11 months ago
- ☆17Feb 8, 2025Updated last year