openai / simple-evals
☆1,954Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for simple-evals
- Reaching LLaMA2 Performance with 0.1M Dollars☆960Updated 3 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,529Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,565Updated 3 months ago
- Tools for merging pretrained large language models.☆4,816Updated 2 weeks ago
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,045Updated 6 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆6,127Updated this week
- Training LLMs with QLoRA + FSDP☆1,418Updated last week
- ☆935Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,529Updated last week
- DataComp for Language Models☆1,157Updated this week
- ☆4,035Updated 5 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,336Updated 7 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,653Updated this week
- A native PyTorch Library for large model training☆2,623Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,045Updated this week
- Code for Quiet-STaR☆651Updated 3 months ago
- ☆2,746Updated 2 months ago
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆883Updated last month
- Arena-Hard-Auto: An automatic LLM benchmark.☆653Updated last week
- PyTorch native finetuning library☆4,336Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆1,824Updated 2 weeks ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆517Updated 2 weeks ago
- ReFT: Representation Finetuning for Language Models☆1,159Updated 2 weeks ago
- ☆2,506Updated 6 months ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆687Updated 2 months ago
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,524Updated 2 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆797Updated 2 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,057Updated 2 months ago