philschmid / evaluate-llmsLinks

Includes examples on how to evaluate LLMs

☆23

Alternatives and similar repositories for evaluate-llms

Users that are interested in evaluate-llms are comparing it to the libraries listed below

Sorting:

rajshah4 / LLM-Evaluation
Sample notebooks and prompts for LLM evaluation
☆151Updated last week
ibm-self-serve-assets / SuperKnowa
Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging co…
☆113Updated last year
tahreemrasul / semantic_research_engine
A semantic research engine to get relevant papers based on a user query. Application frontend with Chainlit Copilot. Observability with L…
☆82Updated last year
anyscale / e2e-llm-workflows
Fine-tune an LLM to perform batch inference and online serving.
☆112Updated 4 months ago
fw-ai / cookbook
Recipes and resources for building, deploying, and fine-tuning generative AI with Fireworks.
☆124Updated 2 weeks ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆50Updated last year
davanstrien / data-for-fine-tuning-llms
☆80Updated last year
anakin87 / mistral-haystack
Mistral + Haystack: build RAG pipelines that rock 🤘
☆106Updated last year
mlabonne / how-to-data-science
Scripts, notebooks, and articles about data science in general.
☆52Updated 2 years ago
apple / ml-superposition-prompting
☆146Updated last year
PrithivirajDamodaran / Route0x
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆116Updated 6 months ago
AhmedSSoliman / Llama2-CodeGen-Fine-Tuning-LLama-2
☆15Updated 2 years ago
infocusp / llm_seminar_series
Material for the series of seminars on Large Language Models
☆34Updated last year
muellerzr / nbdistributed
Seemless interface of using PyTOrch distributed with Jupyter notebooks
☆50Updated last month
keitazoumana / multimodal-rag-esg
The application of multimodal RAG for Sustainable finance
☆23Updated last year
AI-Maker-Space / Awesome-AIM-Index
An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!
☆86Updated this week
alopatenko / LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…
☆143Updated this week
evidentlyai / community-examples
Examples of using Evidently to evaluate, test and monitor ML models.
☆40Updated 3 weeks ago
pacman100 / peft-codegen-25
☆23Updated 2 years ago
amogkam / llama_index_ray
Using LlamaIndex with Ray for productionizing LLM applications
☆71Updated 2 years ago
jjovalle99 / agentic-design-patterns
☆14Updated last year
georgesung / LLM-WikipediaQA
Document Q&A on Wikipedia articles using LLMs
☆79Updated 2 years ago
anyscale / ray-summit-2023-training
☆88Updated 2 years ago
S1M0N38 / dspy-arxiv
Explore the use of DSPy for extracting features from PDFs 🔎
☆46Updated last year
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆110Updated last year
jayita13 / GenerativeAI
GenAI Experimentation
☆57Updated last month
CVxTz / llm-serve-tutorial
☆20Updated last year
Paulescu / testing-llms-in-the-real-world
Test LLMs automatically with Giskard and CI/CD
☆31Updated last year
Rachnog / intro_to_llm_agents
Simple introduction to LLM Agents
☆139Updated last year
qdrant / qdrant-rag-eval
This repo is the central repo for all the RAG Evaluation reference material and partner workshop
☆76Updated 5 months ago