aymeric-roucher / benchmark_agentsLinks
β27Updated last year
Alternatives and similar repositories for benchmark_agents
Users that are interested in benchmark_agents are comparing it to the libraries listed below
Sorting:
- π§ Compare how Agent systems perform on several benchmarks. ππβ103Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β51Updated last year
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β77Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisionsβ27Updated 2 years ago
- β84Updated 2 years ago
- β43Updated last year
- Reward Model framework for LLM RLHFβ62Updated 2 years ago
- Codebase accompanying the Summary of a Haystack paper.β80Updated last year
- β55Updated 5 months ago
- Small and Efficient Mathematical Reasoning LLMsβ73Updated 2 years ago
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"β137Updated 2 years ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ126Updated 3 months ago
- β86Updated 2 years ago
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first appβ¦β169Updated 2 years ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ72Updated last year
- LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.β69Updated 2 years ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ108Updated 4 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ277Updated last year
- Evaluating LLMs with CommonGen-Liteβ94Updated last year
- A collection of hand on notebook for LLMs practitionerβ51Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β76Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"β120Updated 3 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.β50Updated 2 years ago
- Simple introduction to LLM Agentsβ140Updated last year
- Mixing Language Models with Self-Verification and Meta-Verificationβ112Updated last year
- β16Updated 2 years ago
- β82Updated 3 months ago
- Code for NeurIPS LLM Efficiency Challengeβ60Updated last year
- β23Updated 2 years ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Modelsβ101Updated 2 years ago