aymeric-roucher / benchmark_agents
☆25Updated last year
Alternatives and similar repositories for benchmark_agents:
Users that are interested in benchmark_agents are comparing it to the libraries listed below
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 8 months ago
- ☆52Updated last month
- Various installation guides for Large Language Models☆64Updated 4 months ago
- ☆14Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 5 months ago
- ☆24Updated last year
- LLM reads a paper and produce a working prototype☆51Updated 2 weeks ago
- Build Agentic workflows with function calling using open LLMs☆26Updated last week
- A collection of hand on notebook for LLMs practitioner☆47Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆59Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆92Updated 5 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆102Updated 11 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆76Updated 6 months ago
- ☆48Updated 4 months ago
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆19Updated last year
- ☆29Updated last year
- Automatic Prompt Optimization☆28Updated 10 months ago
- A Hands-on Practical Guide to LlamaIndex☆32Updated 5 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 4 months ago
- ☆45Updated 11 months ago
- RAG example using DSPy, Gradio, FastAPI☆76Updated 11 months ago
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆166Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 11 months ago
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆46Updated 6 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.☆69Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- ☆76Updated 9 months ago