MiuLab / LLM-EvalLinks
☆15Updated 2 years ago
Alternatives and similar repositories for LLM-Eval
Users that are interested in LLM-Eval are comparing it to the libraries listed below
Sorting:
- Small and Efficient Mathematical Reasoning LLMs☆73Updated last year
- ☆55Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Data preparation code for Amber 7B LLM☆94Updated last year
- Submodule of evalverse forked from [google-research/instruction_following_eval](https://github.com/google-research/google-research/tree/m…☆14Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- ☆36Updated 4 months ago
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 4 months ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆29Updated last year
- ☆43Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated 2 years ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- Verifiers for LLM Reinforcement Learning☆80Updated 8 months ago
- ☆63Updated 6 months ago
- ☆39Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 10 months ago
- ☆25Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- ☆48Updated last year
- ☆75Updated last year
- ☆71Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated last year
- ☆36Updated 4 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Updated 5 months ago
- ☆38Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Aioli: A unified optimization framework for language model data mixing☆31Updated 11 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year