tcapelle / mistral_wandbLinks
A full fledged mistral+wandb
☆13Updated 11 months ago
Alternatives and similar repositories for mistral_wandb
Users that are interested in mistral_wandb are comparing it to the libraries listed below
Sorting:
- A small library of LLM judges☆232Updated 3 weeks ago
- ☆145Updated 11 months ago
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆68Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆163Updated this week
- Sample notebooks and prompts for LLM evaluation☆135Updated last month
- ☆20Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 9 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆123Updated last week
- Inference-time scaling for LLMs-as-a-judge.☆251Updated this week
- ☆40Updated last year
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆219Updated this week
- ☆97Updated 2 weeks ago
- ☆53Updated last year
- Official Repo for CRMArena and CRMArena-Pro☆101Updated 3 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆250Updated 9 months ago
- Official Code Release for "Training a Generally Curious Agent"☆28Updated 2 months ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆108Updated last month
- Source code for the collaborative reasoner research project at Meta FAIR.☆95Updated 3 months ago
- Includes examples on how to evaluate LLMs☆23Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆98Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 10 months ago
- ☆70Updated this week
- ☆52Updated last year
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated 2 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆128Updated last year
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆217Updated last year
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆107Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆101Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago