MiuLab / LLM-EvalLinks
☆15Updated 2 years ago
Alternatives and similar repositories for LLM-Eval
Users that are interested in LLM-Eval are comparing it to the libraries listed below
Sorting:
- Nexusflow function call, tool use, and agent benchmarks.☆29Updated 10 months ago
- ☆60Updated 4 months ago
- ☆55Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆100Updated last week
- ☆24Updated 2 months ago
- Open Implementations of LLM Analyses☆107Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- ☆43Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆46Updated last year
- Data preparation code for Amber 7B LLM☆93Updated last year
- Train, tune, and infer Bamba model☆135Updated 5 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- ☆35Updated 5 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆58Updated 3 weeks ago
- ☆36Updated 3 months ago
- Aioli: A unified optimization framework for language model data mixing☆28Updated 9 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 11 months ago
- ☆75Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 9 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 11 months ago
- ☆40Updated 5 months ago
- ☆96Updated 7 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated last year
- ☆20Updated 7 months ago
- Evaluating LLMs with fewer examples☆165Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Evaluating LLMs with CommonGen-Lite☆91Updated last year