A list of LLM benchmark frameworks.
☆74Feb 17, 2024Updated 2 years ago
Alternatives and similar repositories for llm-benchmark
Users that are interested in llm-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.☆28Nov 16, 2023Updated 2 years ago
- Provide RNA and DNA Foundation Model Benchmarks and Applications☆29Nov 26, 2025Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆104Jun 15, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆64Mar 26, 2024Updated 2 years ago
- alternative way to calculating self attention☆18May 25, 2024Updated last year
- Winter School 2018 - Deep Learning☆13Sep 18, 2018Updated 7 years ago
- visual question answering prompting recipes for large vision-language models☆28Sep 14, 2024Updated last year
- ☆48Sep 7, 2024Updated last year
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31May 29, 2023Updated 2 years ago
- ☆10Mar 1, 2023Updated 3 years ago
- ☆12Jan 31, 2024Updated 2 years ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Codebase used to build NREL's National Thermal Generator Performance Database☆15Mar 17, 2021Updated 5 years ago
- ☆14May 26, 2021Updated 4 years ago
- Local emulator for Hugging Face Inference Endpoints customer handlers☆27Apr 3, 2026Updated last month
- Just a bunch of benchmark logs for different LLMs☆124Jul 28, 2024Updated last year
- Docker powered container for using Nginx as reverse-proxy in combination with an OpenVPN Client.☆11Jan 1, 2020Updated 6 years ago
- leveldb FUSE filesystem☆17May 15, 2014Updated 11 years ago
- Intake-esm Datastore☆13Apr 27, 2026Updated last week
- Gaussian imputation of GWAS summary statistics☆15Apr 16, 2015Updated 11 years ago
- Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models☆91Apr 4, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Edward Elric's Blog☆15Feb 5, 2026Updated 3 months ago
- A Python implementation demonstrating three fundamental linked list techniques with clear examples and detailed explanations. Features Mu…☆20Jul 10, 2025Updated 9 months ago
- ☆15Sep 30, 2022Updated 3 years ago
- ☆48Aug 5, 2025Updated 9 months ago
- A Scenario Creator☆12Jun 12, 2024Updated last year
- Evaluating LLMs with CommonGen-Lite☆95Mar 21, 2024Updated 2 years ago
- 大模型多维度中文对齐评测基准 (ACL 2024)☆426Oct 25, 2025Updated 6 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 6 months ago
- Replication of "Regularizing and Optimizing LSTM Language Models" by Merity et al. (2017).☆12Sep 17, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Small, simple agent task environments for training and evaluation☆19Nov 1, 2024Updated last year
- ☆16Mar 18, 2026Updated last month
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆189Dec 23, 2023Updated 2 years ago
- ☆12Feb 16, 2024Updated 2 years ago
- Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics work☆10May 6, 2020Updated 6 years ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Oct 16, 2023Updated 2 years ago
- Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yo…☆104Nov 24, 2023Updated 2 years ago