lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆297Updated last year
Alternatives and similar repositories for llm-decontaminator:
Users that are interested in llm-decontaminator are comparing it to the libraries listed below
- Manage scalable open LLM inference endpoints in Slurm clusters☆247Updated 6 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆382Updated 9 months ago
- ☆484Updated last month
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 7 months ago
- A simple unified framework for evaluating LLMs☆164Updated 3 weeks ago
- Scaling Data-Constrained Language Models☆330Updated 3 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆447Updated 9 months ago
- Experiments on speculative sampling with Llama models☆122Updated last year
- A project to improve skills of large language models☆230Updated this week
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 2 months ago
- A bagel, with everything.☆315Updated 9 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆377Updated 3 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 5 months ago
- A repository for research on medium sized language models.☆484Updated this week
- Expert Specialized Fine-Tuning☆167Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆206Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆197Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.☆491Updated last week
- Pre-training code for Amber 7B LLM☆160Updated 8 months ago
- Reproducible, flexible LLM evaluations☆118Updated last month
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆303Updated 3 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆287Updated 2 months ago
- An Open Source Toolkit For LLM Distillation☆425Updated last week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆153Updated 3 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆178Updated last month
- LOFT: A 1 Million+ Token Long-Context Benchmark☆164Updated 2 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆273Updated this week
- A family of compressed models obtained via pruning and knowledge distillation☆309Updated 2 months ago
- DSIR large-scale data selection framework for language model training☆242Updated 9 months ago
- Official repository for ORPO☆430Updated 7 months ago