Large-language Model Evaluation framework with Elo Leaderboard and A-B testing
☆52Oct 24, 2024Updated last year
Alternatives and similar repositories for h2o-LLM-eval
Users that are interested in h2o-LLM-eval are comparing it to the libraries listed below
Sorting:
- A simple repository showcasing a few LLM Evaluation strategies and leverages W&B Sweeps to optimize the LLM system.☆12Jul 11, 2023Updated 2 years ago
- ☆11Jan 3, 2024Updated 2 years ago
- Code and data for automatic paraphrase dataset augmentation.☆11Mar 8, 2021Updated 4 years ago
- ☆13Jul 30, 2024Updated last year
- ☆17Dec 11, 2023Updated 2 years ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆21Aug 13, 2024Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆47Nov 19, 2023Updated 2 years ago
- ☆50Apr 10, 2024Updated last year
- ☆28Nov 10, 2025Updated 3 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- ☆28Sep 21, 2024Updated last year
- ☆29Apr 10, 2025Updated 10 months ago
- Make reasoning models scalable☆47May 31, 2025Updated 9 months ago
- Logical inference system based on event semantics and degree semantics in formal semantics☆11Jan 22, 2023Updated 3 years ago
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Jan 15, 2024Updated 2 years ago
- pnpm update for pnpm workspace catalogs.☆16Mar 25, 2025Updated 11 months ago
- Python wrapper for the energy system optimization framework IESopt.☆18Updated this week
- Multi-Agent LLM System for Digital Scam Protection☆12Dec 19, 2024Updated last year
- WaPENの文法をPythonっぽくしたもの☆14Updated this week
- ☆12Jan 11, 2026Updated last month
- GPT API Cost Estimation for Enterprises☆13Oct 24, 2023Updated 2 years ago
- Automate your blogging with AI-powered tools for creating, optimizing, and deploying content. Generate SEO-optimized articles effortlessl…☆12Aug 16, 2024Updated last year
- ☆12May 23, 2024Updated last year
- Demonstrate using MCP with Pydantic AI framework☆14Mar 14, 2025Updated 11 months ago
- Scrapy抓取豆瓣图书☆10Aug 19, 2016Updated 9 years ago
- A script that generates an OpenAPI 3.1.0 schema based on your Airtable base structure. This schema is designed for use with Custom GPT (C…☆12Sep 26, 2024Updated last year
- ☆40Mar 30, 2022Updated 3 years ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- Encode and decode 26-bit, 34-bit, or 38-bit Wiegand protocol credentials for communicating with access control systems in TypeScript or J…☆12Sep 3, 2024Updated last year
- ☆14Feb 11, 2026Updated 3 weeks ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆49Aug 6, 2024Updated last year
- FastAPI Microservices Architecture SDK - As Basis for multiple services in a platform/system☆12Oct 4, 2022Updated 3 years ago
- ☆11Jan 8, 2024Updated 2 years ago
- Semantic Kernel Workshop☆13Feb 20, 2026Updated 2 weeks ago
- ほっとするマイクロブログ☆14Feb 26, 2026Updated last week
- ☆11Oct 15, 2022Updated 3 years ago
- Transparent Reporting of Ethics for Generative AI (TREGAI) Checklist☆15Oct 16, 2024Updated last year