huggingface / yourbenchLinks
π€ Benchmark Large Language Models Reliably On Your Data
β315Updated this week
Alternatives and similar repositories for yourbench
Users that are interested in yourbench are comparing it to the libraries listed below
Sorting:
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β289Updated 2 weeks ago
- awesome synthetic (text) datasetsβ281Updated 7 months ago
- β118Updated 9 months ago
- Automatic evals for LLMsβ399Updated this week
- Let's build better datasets, together!β258Updated 5 months ago
- Build datasets using natural languageβ479Updated 3 weeks ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β220Updated 7 months ago
- Late Interaction Models Training & Retrievalβ385Updated this week
- A simple tool that let's you explore different possible paths that an LLM might sample.β169Updated 3 weeks ago
- Automatically evaluate your LLMs in Google Colabβ629Updated last year
- An Open Source Toolkit For LLM Distillationβ612Updated last month
- A Lightweight Library for AI Observabilityβ243Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,563Updated last week
- Attribute (or cite) statements generated by LLMs back to in-context information.β235Updated 7 months ago
- code for training & evaluating Contextual Document Embedding modelsβ191Updated 2 weeks ago
- Code for explaining and evaluating late chunking (chunked pooling)β390Updated 5 months ago
- β152Updated 6 months ago
- Together Open Deep Researchβ298Updated last month
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β119Updated 3 weeks ago
- Tool for generating high quality Synthetic datasetsβ878Updated last week
- CodeScientist: An automated scientific discovery system for code-based experimentsβ263Updated 2 months ago
- βοΈ Awesome LLM Judges βοΈβ103Updated last month
- β142Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β169Updated 8 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.β305Updated 2 months ago
- Simple UI for debugging correlations of text embeddingsβ180Updated this week
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.β208Updated this week
- β256Updated 5 months ago
- β121Updated last month
- Solving data for LLMs - Create quality synthetic datasets!β148Updated 4 months ago