aymeric-roucher / benchmark_agents
☆22Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for benchmark_agents
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- ☆40Updated last week
- PyTorch implementation for MRL☆18Updated 8 months ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆41Updated 10 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆92Updated last month
- ☆87Updated 9 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆100Updated 9 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 9 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆100Updated 2 months ago
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆73Updated 2 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆47Updated 10 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆61Updated 8 months ago
- End-to-End LLM Guide☆97Updated 4 months ago
- ☆64Updated 5 months ago
- ☆24Updated last year
- ☆31Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- ☆29Updated 4 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆162Updated 6 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆43Updated 10 months ago
- ☆16Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆53Updated 7 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆161Updated 9 months ago
- This playlab encompasses a multitude of projects crafted through the utilization of Large Language Models, showcasing the versatility and…☆74Updated last month
- ☆103Updated 2 months ago
- A competition to get you started on the NeurIPS AI Hackercup☆27Updated last month
- ☆92Updated last month