philschmid / evaluate-llms
Includes examples on how to evaluate LLMs
☆19Updated 2 months ago
Related projects: ⓘ
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 2 months ago
- ☆24Updated last year
- ☆71Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- ☆15Updated last year
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆27Updated 3 weeks ago
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated 9 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆43Updated 8 months ago
- Resources for exploring Generative Feedback Loops with Weaviate!☆35Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆40Updated 9 months ago
- ☆41Updated last year
- Retrieval Augmented Generation applications☆27Updated 11 months ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆32Updated last year
- An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!☆40Updated last week
- ☆28Updated 7 months ago
- This repository contains the implementation of evaluation metrics for recommendation systems. We have compared similarity, candidate gene…☆14Updated 8 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆40Updated 6 months ago
- The LangChain Crash Course Repository is a concise and comprehensive collection of learning materials for the LangChain programming langu…☆19Updated last year
- End-to-End LLM Guide☆91Updated 2 months ago
- Streamlit app for recommending eval functions using prompt diffs☆24Updated 8 months ago
- Test LLMs automatically with Giskard and CI/CD☆28Updated last month
- ☆14Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year
- Supplementary material for "Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to Adapters"☆42Updated last year
- A competition to get you started on the NeurIPS AI Hackercup☆22Updated this week
- ☆24Updated 2 months ago
- Fullstack chatbot application☆11Updated last month
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆91Updated last week
- Mistral + Haystack: build RAG pipelines that rock 🤘☆99Updated 7 months ago
- Chunk your text using gpt4o-mini more accurately☆37Updated last month