MiuLab / LLM-EvalLinks
☆15Updated 2 years ago
Alternatives and similar repositories for LLM-Eval
Users that are interested in LLM-Eval are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 11 months ago
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- ☆53Updated 9 months ago
- Data preparation code for Amber 7B LLM☆91Updated last year
- ☆37Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆81Updated last week
- Verifiers for LLM Reinforcement Learning☆69Updated 3 months ago
- ☆93Updated 4 months ago
- Open Implementations of LLM Analyses☆105Updated 10 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- ☆41Updated last year
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆25Updated 8 months ago
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆45Updated this week
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 2 months ago
- Simple examples using Argilla tools to build AI☆53Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- ☆54Updated last month
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆72Updated 8 months ago
- minimal GRPO implementation from scratch☆94Updated 4 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆22Updated this week
- ☆95Updated 10 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆97Updated last year
- a curated list of the role of small models in the LLM era☆103Updated 10 months ago
- ☆24Updated 10 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Evaluating LLMs with fewer examples☆160Updated last year
- ☆73Updated 3 weeks ago
- The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆86Updated 2 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆100Updated 3 months ago