Sample notebooks and prompts for LLM evaluation
☆161Nov 2, 2025Updated 6 months ago
Alternatives and similar repositories for LLM-Evaluation
Users that are interested in LLM-Evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.☆28Nov 16, 2023Updated 2 years ago
- Perplexity Lite using Langgraph, Tavily, and GPT-4.☆14Jan 11, 2024Updated 2 years ago
- ☆25Dec 12, 2025Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- Real-time data pipeline for AI apps in Azure☆26Dec 5, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- SLIM Models by LLMWare. A streamlit app showing the capabilities for AI Agents and Function Calls.☆21Feb 11, 2024Updated 2 years ago
- ☆19Dec 8, 2022Updated 3 years ago
- Study the temporal performance degradation of machine learning models.☆16Jan 26, 2024Updated 2 years ago
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design☆22Dec 13, 2024Updated last year
- Is a high-performance Augmented Recovery-Generation (RAG) solution based on Redis, Qdrant or PostgreSQL. It offers a high-level interface…☆30Jan 6, 2026Updated 4 months ago
- Vectorized implementation of a general feedforward neural network in Python☆10Jan 22, 2017Updated 9 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆87Aug 12, 2024Updated last year
- Routing with reinforcement learning☆10Apr 9, 2022Updated 4 years ago
- ☆19Jun 26, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- an unofficial Georgia Tech theme for JupyterLab☆10Jun 29, 2021Updated 4 years ago
- DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing☆40Jan 26, 2024Updated 2 years ago
- A tool for evaluating LLMs☆429Mar 15, 2026Updated 2 months ago
- ☆29Apr 29, 2024Updated 2 years ago
- Sample applications built on the Graphlit Platform☆79Oct 11, 2025Updated 7 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆103Aug 4, 2025Updated 9 months ago
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆803May 8, 2024Updated 2 years ago
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆988Nov 22, 2024Updated last year
- Evals for agents☆15Dec 4, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Build Your Own AI Chatbot with Streamlit and Ollama: A Step-by-Step Tutorial☆25May 1, 2024Updated 2 years ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,601Apr 17, 2026Updated last month
- Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stac…☆256Apr 11, 2025Updated last year
- Code for ACL 2022 Paper: Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons☆14Dec 22, 2022Updated 3 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- Based on "long-form-factuality" a python based processor to easily fact check anything.☆20Apr 1, 2024Updated 2 years ago
- ☆15Jun 5, 2025Updated 11 months ago
- Tutorial on probabilistic classification and cost-sensitive learning.☆13Aug 19, 2025Updated 9 months ago
- Just some stuff for Interview questions, books, annotated paper, notes, cheat sheets etc etc related to ML,AI, Deep Learning and Data Sc…☆124Aug 25, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Benchmark suite for LLMs from Fireworks.ai☆102May 15, 2026Updated last week
- 🦖 X—LLM: Cutting Edge & Easy LLM Finetuning☆408Jan 17, 2024Updated 2 years ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Oct 1, 2024Updated last year
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 4 years ago
- ☆12Apr 29, 2022Updated 4 years ago
- Code examples and jupyter notebooks for the Cohere Platform☆505Jan 16, 2025Updated last year
- Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…☆636Nov 24, 2025Updated 5 months ago