tcapelle / mistral_wandbLinks
A full fledged mistral+wandb
☆13Updated 10 months ago
Alternatives and similar repositories for mistral_wandb
Users that are interested in mistral_wandb are comparing it to the libraries listed below
Sorting:
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆18Updated 2 weeks ago
- Includes examples on how to evaluate LLMs☆23Updated 7 months ago
- Sample notebooks and prompts for LLM evaluation☆134Updated 2 weeks ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆107Updated last year
- This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation fr…☆17Updated last year
- ☆47Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆156Updated last week
- Retrieval Augmented Generation Generalized Evaluation Dataset☆53Updated 7 months ago
- A small library of LLM judges☆216Updated last week
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆65Updated 2 months ago
- ☆69Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 11 months ago
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated last month
- ☆39Updated 11 months ago
- ☆77Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆46Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆87Updated 8 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆217Updated last year
- ☆23Updated last year
- ☆19Updated last year
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆112Updated this week
- Official Code Release for "Training a Generally Curious Agent"☆25Updated last month
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆122Updated last month
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆144Updated last year
- ☆92Updated 3 weeks ago
- ☆51Updated 7 months ago
- A curated list of materials on AI guardails☆38Updated 3 weeks ago
- ☆75Updated 5 months ago