rajshah4/LLM-Evaluation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rajshah4/LLM-Evaluation)

rajshah4 / LLM-Evaluation

Sample notebooks and prompts for LLM evaluation

☆175

Alternatives and similar repositories for LLM-Evaluation

Users that are interested in LLM-Evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AIAnytime / Evaluation-of-LLMs-and-RAGs
View on GitHub
A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.
☆28Nov 16, 2023Updated 2 years ago
AIAnytime / Perplexity-Lite
View on GitHub
Perplexity Lite using Langgraph, Tavily, and GPT-4.
☆14Jan 11, 2024Updated 2 years ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
al-mz / insight-copilot
View on GitHub
An open-source template that enables natural language querying and real-time data visualization for structured dataset.
☆26Jun 24, 2025Updated last year
GAIR-NLP / scaleeval
View on GitHub
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Feb 15, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
AIAnytime / SLIM-Models-by-LLMWare
View on GitHub
SLIM Models by LLMWare. A streamlit app showing the capabilities for AI Agents and Function Calls.
☆21Feb 11, 2024Updated 2 years ago
whylabs / langkit
View on GitHub
🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…
☆993Nov 22, 2024Updated last year
lgraesser / NeuralNetwork
View on GitHub
Vectorized implementation of a general feedforward neural network in Python
☆10Jan 22, 2017Updated 9 years ago
keitazoumana / LLMs
View on GitHub
Repository for my LLM notebooks
☆30Aug 8, 2024Updated last year
tomaarsen / nltk_theme
View on GitHub
Sphinx theme for NLTK
☆17Nov 7, 2021Updated 4 years ago
felipemaiapolo / tinyBenchmarks
View on GitHub
Evaluating LLMs with fewer examples
☆181Jul 4, 2026Updated 2 weeks ago
jacobmarks / ten-weeks-of-plugins
View on GitHub
My journey during 10 weeks of building FiftyOne plugins
☆22Nov 12, 2023Updated 2 years ago
Atcold / cs-video-courses
View on GitHub
List of Computer Science courses with video lectures.
☆27Feb 17, 2022Updated 4 years ago
ZhangYuanhan-AI / CelebASpoofChallengeSubmissionExample
View on GitHub
This is a submission example for CelebA-Spoof Challenge participants.
☆10Sep 8, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
JINO-ROHIT / Tune-RAG-Parameters-With-LlamaIndex
View on GitHub
☆18Jun 26, 2024Updated 2 years ago
jxnl / mit-lecture
View on GitHub
☆10Feb 25, 2025Updated last year
MLGroupJLU / LLM-eval-survey
View on GitHub
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
☆1,609Apr 17, 2026Updated 3 months ago
shreyashji / Dsa-essential
View on GitHub
this is code repository for data structures and algorithms, pushed codes from my local repo to github for learning and easy access and to…
☆11Nov 21, 2021Updated 4 years ago
apriantoa917 / Python-Latihan-DTS-2019
View on GitHub
Ini kumpulan beberapa materi lab pada Digitalent Schoolarship Python Essentials 2019
☆10Mar 27, 2022Updated 4 years ago
AI-ANK / Airbnb-Listing-Explorer
View on GitHub
☆29Apr 29, 2024Updated 2 years ago
terryyz / llm-benchmark
View on GitHub
A list of LLM benchmark frameworks.
☆75Feb 17, 2024Updated 2 years ago
teknium1 / LLM-Benchmark-Logs
View on GitHub
Just a bunch of benchmark logs for different LLMs
☆130Jul 28, 2024Updated last year
marjanstoimchev / tpa-cnn
View on GitHub
Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition
☆11Dec 7, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chanmuzi / NLP-Paper-News
View on GitHub
The list of NLP paper and news I've checked. There might be short description of them (abstract) in Korean.
☆38Updated this week
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆14,970Feb 24, 2026Updated 4 months ago
graphlit / graphlit-samples
View on GitHub
Sample applications built on the Graphlit Platform
☆78Oct 11, 2025Updated 9 months ago
onejune2018 / Awesome-LLM-Eval
View on GitHub
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆653Nov 24, 2025Updated 8 months ago
drivendataorg / tissuenet-cervical-biopsies
View on GitHub
Winners of the TissueNet: Detect Lesions in Cervical Biopsies competition
☆22Sep 7, 2023Updated 2 years ago
m-yoshiro / storybook-mcp
View on GitHub
☆21Apr 15, 2025Updated last year
PacificAI / langtest
View on GitHub
Deliver safe & effective language models
☆563Updated this week
JStehouwer / GOAS_CVPR2020
View on GitHub
☆15Apr 1, 2020Updated 6 years ago
McGill-NLP / feedbackqa
View on GitHub
FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback
☆12Jul 13, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
gilinachum / bedrock-latency
View on GitHub
Tools to measure latency for LLM in Amazon Bedrook
☆22Jan 20, 2026Updated 6 months ago
causalNLP / cladder
View on GitHub
We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.
☆146May 29, 2024Updated 2 years ago
ammarahhashmi / Multimodal-Forgery-Detection-Using-Ensemble-Learning
View on GitHub
This repository contains the official implementation (PyTorch) of "Multimodal Forgery Detection Using Ensemble Learning" proposed in APSI…
☆10Jan 4, 2023Updated 3 years ago
OpenM3D / M3DBench
View on GitHub
[ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.
☆61Oct 1, 2024Updated last year
langchain-ai / langchain-upstage
View on GitHub
☆16Updated this week
AlexIoannides / llm-regression
View on GitHub
Exploring the classical regression capabilities of LLMs.
☆18May 20, 2024Updated 2 years ago
aws-samples / llm-apps-workshop
View on GitHub
Use LLMs for building real-world apps
☆113Jan 17, 2025Updated last year