lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆264Updated 8 months ago
Related projects: ⓘ
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆195Updated 3 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆290Updated 5 months ago
- ☆419Updated 2 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆421Updated 2 weeks ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆416Updated 6 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆139Updated 3 weeks ago
- Multipack distributed sampler for fast padding-free training of LLMs☆170Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- The official evaluation suite and dynamic data release for MixEval.☆200Updated this week
- A pipeline to improve skills of large language models☆149Updated this week
- An Open Source Toolkit For LLM Distillation☆284Updated last month
- RewardBench: the first evaluation tool for reward models.☆352Updated last week
- Scaling Data-Constrained Language Models☆310Updated this week
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆139Updated 7 months ago
- Official repository for ORPO☆409Updated 3 months ago
- A bagel, with everything.☆306Updated 5 months ago
- Expert Specialized Fine-Tuning☆129Updated last month
- A simple unified framework for evaluating LLMs☆121Updated this week
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆333Updated last month
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated 5 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆337Updated last month
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆248Updated 10 months ago
- Generative Representational Instruction Tuning☆527Updated 2 weeks ago
- Experiments on speculative sampling with Llama models☆114Updated last year
- batched loras☆327Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆608Updated last month
- An Analytical Evaluation Board of Multi-turn LLM Agents☆227Updated 3 months ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆207Updated last year
- ☆284Updated 3 months ago