MiuLab / LLM-EvalLinks
☆15Updated last year
Alternatives and similar repositories for LLM-Eval
Users that are interested in LLM-Eval are comparing it to the libraries listed below
Sorting:
- ☆52Updated 8 months ago
- ☆69Updated last month
- Train, tune, and infer Bamba model☆130Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 10 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆73Updated 2 weeks ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- ☆95Updated 9 months ago
- Data preparation code for Amber 7B LLM☆91Updated last year
- ☆34Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆65Updated 3 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Evaluating LLMs with fewer examples☆160Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 9 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆55Updated 9 months ago
- Pre-training code for CrystalCoder 7B LLM☆54Updated last year
- Open Implementations of LLM Analyses☆105Updated 9 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆63Updated 10 months ago
- An NVIDIA AI Workbench Example Project for Finetuning Llama 2☆29Updated 10 months ago
- ☆40Updated 7 months ago
- ☆39Updated last year
- ☆36Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆108Updated 3 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆97Updated last year
- ☆68Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 5 months ago
- ☆48Updated 5 months ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆25Updated 7 months ago