h2oai / h2o-LLM-evalView external linksLinks
Large-language Model Evaluation framework with Elo Leaderboard and A-B testing
☆52Oct 24, 2024Updated last year
Alternatives and similar repositories for h2o-LLM-eval
Users that are interested in h2o-LLM-eval are comparing it to the libraries listed below
Sorting:
- ☆11Jan 3, 2024Updated 2 years ago
- Code and data for automatic paraphrase dataset augmentation.☆11Mar 8, 2021Updated 4 years ago
- ☆13Jul 30, 2024Updated last year
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆21Nov 4, 2025Updated 3 months ago
- ☆17Dec 11, 2023Updated 2 years ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆21Aug 13, 2024Updated last year
- ☆50Apr 10, 2024Updated last year
- ☆28Nov 10, 2025Updated 3 months ago
- ☆37Oct 15, 2024Updated last year
- Make reasoning models scalable☆47May 31, 2025Updated 8 months ago
- ☆28Sep 21, 2024Updated last year
- Logical inference system based on event semantics and degree semantics in formal semantics☆11Jan 22, 2023Updated 3 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Jan 15, 2024Updated 2 years ago
- GPT API Cost Estimation for Enterprises☆13Oct 24, 2023Updated 2 years ago
- A repository aimed at sharing links to climate-related resources.☆12Feb 4, 2026Updated last week
- WaPENの文法をPythonっぽくしたもの☆14Feb 8, 2026Updated last week
- ☆12Jan 11, 2026Updated last month
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Evaluation Pipeline for medical tasks.☆12Updated this week
- Python wrapper for the energy system optimization framework IESopt.☆18Updated this week
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated 2 weeks ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- ☆49Aug 6, 2024Updated last year
- ☆14Nov 12, 2025Updated 3 months ago
- ☆12Nov 5, 2024Updated last year
- Regex base tail written in Rust☆10Mar 20, 2023Updated 2 years ago
- ☆40Nov 3, 2023Updated 2 years ago
- AI Assistance for Writing Scientific Alt Text☆14Feb 7, 2024Updated 2 years ago
- Dataset for training EEG IC classifiers.☆13Aug 29, 2021Updated 4 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- Code for the paper "Modeling Information Change in Science Communication with Semantically Matched Paraphrases" from EMNLP 2022☆13Oct 20, 2022Updated 3 years ago
- ☆12Mar 5, 2025Updated 11 months ago
- ほっとするマイクロブログ☆14Feb 4, 2026Updated last week
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- An open source community who focuses on developing and publishing elegant algorithms, models and tools for science big data mining and kn…☆10Jul 27, 2019Updated 6 years ago
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 8 years ago