lmarena / arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
☆653Updated last week
Related projects ⓘ
Alternatives and complementary repositories for arena-hard-auto
- Code for Quiet-STaR☆651Updated 3 months ago
- ☆451Updated 3 weeks ago
- ☆935Updated 2 weeks ago
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality s…☆491Updated 2 weeks ago
- Large Reasoning Models☆580Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆460Updated this week
- A reading list on LLM based Synthetic Data Generation 🔥☆791Updated 2 weeks ago
- This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?☆723Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆685Updated this week
- ☆515Updated this week
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆408Updated 2 months ago
- Automatically evaluate your LLMs in Google Colab☆559Updated 6 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆294Updated 2 weeks ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆797Updated 2 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆647Updated last month
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆883Updated last month
- The official evaluation suite and dynamic data release for MixEval.☆224Updated last week
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆339Updated 2 months ago
- ☆819Updated last month
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆293Updated 11 months ago
- [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models☆535Updated 3 weeks ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆328Updated 5 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆448Updated 8 months ago
- An Open Source Toolkit For LLM Distillation☆356Updated 2 months ago
- Official repository for ORPO☆421Updated 5 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆438Updated 8 months ago
- A compact LLM pretrained in 9 days by using high quality data☆262Updated last month
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆212Updated last month