mlfoundations / evalchemyLinks
Automatic evals for LLMs
☆429Updated 2 weeks ago
Alternatives and similar repositories for evalchemy
Users that are interested in evalchemy are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆713Updated 3 months ago
- Official repository for ORPO☆455Updated last year
- Reproducible, flexible LLM evaluations☆213Updated last month
- ☆773Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆379Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,629Updated this week
- RewardBench: the first evaluation tool for reward models.☆604Updated last week
- ☆520Updated 7 months ago
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning☆410Updated last week
- A simple unified framework for evaluating LLMs☆217Updated 2 months ago
- Recipes to scale inference-time compute of open models☆1,095Updated 3 weeks ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆545Updated 3 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆357Updated 9 months ago
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 7 months ago
- An Open Source Toolkit For LLM Distillation☆651Updated 2 weeks ago
- 🤗 Benchmark Large Language Models Reliably On Your Data☆329Updated this week
- LOFT: A 1 Million+ Token Long-Context Benchmark☆201Updated last week
- awesome synthetic (text) datasets☆282Updated 7 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆486Updated last month
- ☆569Updated 2 months ago
- ☆297Updated 3 weeks ago
- PyTorch building blocks for the OLMo ecosystem☆234Updated this week
- ☆938Updated 4 months ago
- Synthetic data curation for post-training and structured data extraction☆1,404Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆236Updated last month
- Tina: Tiny Reasoning Models via LoRA☆258Updated 3 weeks ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆224Updated 7 months ago
- A project to improve skills of large language models☆423Updated this week
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning☆936Updated last month
- Scalable toolkit for efficient model alignment☆814Updated 3 weeks ago