MiuLab / LLM-EvalLinks
☆15Updated 2 years ago
Alternatives and similar repositories for LLM-Eval
Users that are interested in LLM-Eval are comparing it to the libraries listed below
Sorting:
- ☆78Updated 2 weeks ago
- ☆54Updated 9 months ago
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Data preparation code for Amber 7B LLM☆91Updated last year
- ☆56Updated 2 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Updated last month
- Open Implementations of LLM Analyses☆106Updated 10 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 3 months ago
- ☆48Updated last year
- ☆121Updated 6 months ago
- ☆34Updated last month
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆25Updated 9 months ago
- ☆33Updated 2 weeks ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆44Updated last year
- Aioli: A unified optimization framework for language model data mixing☆26Updated 7 months ago
- ☆90Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- Verifiers for LLM Reinforcement Learning☆71Updated 4 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 8 months ago
- An NVIDIA AI Workbench Example Project for Finetuning Llama 2☆30Updated last year
- ☆85Updated last year
- Just a bunch of benchmark logs for different LLMs☆120Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 6 months ago
- ☆40Updated 3 months ago