openai / simple-evals
☆2,729Updated this week
Alternatives and similar repositories for simple-evals:
Users that are interested in simple-evals are comparing it to the libraries listed below
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,462Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,659Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,829Updated 8 months ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,724Updated 4 months ago
- AllenAI's post-training codebase☆2,926Updated this week
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,148Updated 11 months ago
- PyTorch native post-training library☆5,123Updated this week
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆1,124Updated 2 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆981Updated 9 months ago
- ☆2,527Updated 11 months ago
- A framework for few-shot evaluation of language models.☆8,761Updated last week
- Minimalistic large language model 3D-parallelism training☆1,808Updated this week
- SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?☆2,862Updated this week
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,378Updated last year
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,671Updated 9 months ago
- Recipes to scale inference-time compute of open models☆1,058Updated 2 months ago
- A library for advanced large language model reasoning☆2,106Updated 2 weeks ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,514Updated last year
- A PyTorch native library for large-scale model training☆3,627Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,042Updated last month
- ☆1,355Updated 5 months ago
- Tools for merging pretrained large language models.☆5,593Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,365Updated this week
- YaRN: Efficient Context Window Extension of Large Language Models☆1,470Updated last year
- Evaluate your LLM's response with Prometheus and GPT4 💯☆911Updated last month
- A bibliography and survey of the papers surrounding o1☆1,190Updated 5 months ago
- ☆1,017Updated 4 months ago
- An Open Large Reasoning Model for Real-World Solutions☆1,483Updated last month
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆685Updated 2 weeks ago
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,348Updated 4 months ago