Sample notebooks and prompts for LLM evaluation
☆173Nov 2, 2025Updated 7 months ago
Alternatives and similar repositories for LLM-Evaluation
Users that are interested in LLM-Evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simple repository showcasing a few LLM Evaluation strategies and leverages W&B Sweeps to optimize the LLM system.☆12Jul 11, 2023Updated 2 years ago
- A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.☆28Nov 16, 2023Updated 2 years ago
- This repository contains multi-modal speech data for African languages that can be used to train ASR and NLP models☆17Aug 31, 2022Updated 3 years ago
- Perplexity Lite using Langgraph, Tavily, and GPT-4.☆14Jan 11, 2024Updated 2 years ago
- Learn how to use Transformer-based models for named-entity recognition (NER) tasks and how to analyze various model features, constraints…☆17Jun 29, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation fr…☆21Nov 16, 2023Updated 2 years ago
- ☆25Dec 12, 2025Updated 6 months ago
- Real-time data pipeline for AI apps in Azure☆26Dec 5, 2023Updated 2 years ago
- SLIM Models by LLMWare. A streamlit app showing the capabilities for AI Agents and Function Calls.☆21Feb 11, 2024Updated 2 years ago
- Study the temporal performance degradation of machine learning models.☆16Jan 26, 2024Updated 2 years ago
- Repository for my LLM notebooks☆30Aug 8, 2024Updated last year
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Oct 24, 2024Updated last year
- Estimates fatigue loads in wind turbines from SCADA data based on supervised learning.☆10Sep 11, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Evaluating LLMs with fewer examples☆179Apr 12, 2024Updated 2 years ago
- My journey during 10 weeks of building FiftyOne plugins☆22Nov 12, 2023Updated 2 years ago
- This is a submission example for CelebA-Spoof Challenge participants.☆10Sep 8, 2020Updated 5 years ago
- ComfyUI node for modular, human‑like Kani TTS. Generate natural, high‑quality speech from text☆38Oct 17, 2025Updated 8 months ago
- ☆19Jun 26, 2024Updated 2 years ago
- ☆17Apr 24, 2024Updated 2 years ago
- A tool for evaluating LLMs☆428Mar 15, 2026Updated 3 months ago
- A list of LLM benchmark frameworks.☆74Feb 17, 2024Updated 2 years ago
- ☆29Apr 29, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Just a bunch of benchmark logs for different LLMs☆130Jul 28, 2024Updated last year
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆35Dec 27, 2023Updated 2 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆64Mar 26, 2024Updated 2 years ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Aug 4, 2025Updated 10 months ago
- Large Scale Benchmark of Large Language Models on African Languages☆19Jul 28, 2025Updated 11 months ago
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆992Nov 22, 2024Updated last year
- Evals for agents☆15Dec 4, 2024Updated last year
- ☆23Jul 10, 2025Updated 11 months ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,603Apr 17, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This project aims to extract ROI like finger tip, Palmprint and Hand-geometry from a single hand image.☆10Aug 24, 2023Updated 2 years ago
- Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stac…☆256Apr 11, 2025Updated last year
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- ☆28Feb 11, 2026Updated 4 months ago
- Ini kumpulan beberapa materi lab pada Digitalent Schoolarship Python Essentials 2019☆10Mar 27, 2022Updated 4 years ago
- ☆10Jun 29, 2022Updated 4 years ago
- Script Center for System Center Configuration Manager☆12Jul 20, 2023Updated 2 years ago