WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆138Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ZeroEval
- Benchmarking LLMs with Challenging Tasks from Real Users☆194Updated last week
- ☆101Updated last month
- The official evaluation suite and dynamic data release for MixEval.☆222Updated this week
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆101Updated 3 weeks ago
- Evaluating LLMs with fewer examples☆133Updated 7 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆110Updated 4 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆114Updated this week
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆200Updated 5 months ago
- ☆89Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 3 weeks ago
- ☆111Updated last month
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆153Updated 3 months ago
- ☆294Updated 5 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆143Updated 3 weeks ago
- Expert Specialized Fine-Tuning☆143Updated last month
- ☆116Updated 5 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆176Updated 3 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 2 weeks ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆150Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆173Updated last week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆156Updated 3 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆118Updated last month
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆95Updated this week
- A pipeline for LLM knowledge distillation☆77Updated 3 months ago
- ☆42Updated this week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆160Updated last month
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆141Updated 9 months ago