MiuLab / LLM-Eval
☆15Updated last year
Alternatives and similar repositories for LLM-Eval:
Users that are interested in LLM-Eval are comparing it to the libraries listed below
- ☆20Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Train, tune, and infer Bamba model☆88Updated this week
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆24Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 11 months ago
- ☆40Updated 8 months ago
- ☆53Updated last week
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆86Updated 2 weeks ago
- ☆24Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"☆27Updated 2 months ago
- Pre-training code for CrystalCoder 7B LLM☆54Updated 11 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆69Updated this week
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆17Updated 3 weeks ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Code and Data for "Language Modeling with Editable External Knowledge"☆32Updated 10 months ago
- Evaluating LLMs with fewer examples☆151Updated last year
- ☆24Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 8 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated 11 months ago
- ☆24Updated 7 months ago