mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
β613Updated 11 months ago
Alternatives and similar repositories for llm-autoeval:
Users that are interested in llm-autoeval are comparing it to the libraries listed below
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.β468Updated 7 months ago
- Evaluate your LLM's response with Prometheus and GPT4 π―β901Updated 3 weeks ago
- β509Updated 4 months ago
- awesome synthetic (text) datasetsβ267Updated 5 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ273Updated 8 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,400Updated this week
- This is our own implementation of 'Layer Selective Rank Reduction'β234Updated 10 months ago
- β524Updated 7 months ago
- Generate textbook-quality synthetic LLM pretraining dataβ498Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"β300Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β210Updated 5 months ago
- β849Updated 7 months ago
- Official repository for ORPOβ447Updated 10 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ648Updated 10 months ago
- A bagel, with everything.β318Updated last year
- NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRavβ¦β313Updated last year
- A comprehensive repository of reasoning tasks for LLMs (and beyond)β428Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ319Updated 5 months ago
- An Open Source Toolkit For LLM Distillationβ569Updated 3 months ago
- Automatic evals for LLMsβ361Updated this week
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β186Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ231Updated 5 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ218Updated this week
- β444Updated last year
- Manage scalable open LLM inference endpoints in Slurm clustersβ254Updated 9 months ago
- π¦ XβLLM: Cutting Edge & Easy LLM Finetuningβ400Updated last year
- LLM Workshop by Sourab Mangrulkarβ373Updated 9 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ693Updated last year
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β1,130Updated 3 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100sβ709Updated last year