aws-samples / evaluating-large-language-models-using-llm-as-a-judge
☆12Updated this week
Alternatives and similar repositories for evaluating-large-language-models-using-llm-as-a-judge:
Users that are interested in evaluating-large-language-models-using-llm-as-a-judge are comparing it to the libraries listed below
- ☆18Updated 3 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated 2 months ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆10Updated 3 months ago
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.☆12Updated 3 months ago
- Streamlit app for recommending eval functions using prompt diffs☆26Updated last year
- Tool to take your ML model from local to production with one-line of code.☆25Updated 11 months ago
- Unstract's interface to LLMs, Embeddings and VectorDBs.☆18Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 3 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆28Updated 2 weeks ago
- Rust bindings for CTranslate2☆14Updated last year
- AI_Powered_Dev_Search_Engine☆12Updated 10 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 9 months ago
- Fullstack chatbot application☆11Updated 5 months ago
- ☆19Updated 2 months ago
- Training hybrid models for dummies.☆16Updated last month
- A python command-line tool to download & manage MLX AI models from Hugging Face.☆16Updated 4 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Measuring and Controlling Persona Drift in Language Model Dialogs☆15Updated 10 months ago
- Agent computer interface for AI software engineer.☆22Updated this week
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- Run LLMs on Replicate with vLLM☆15Updated 3 months ago
- ☆39Updated last month
- Writing Blog Posts with Generative Feedback Loops!☆46Updated 9 months ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆34Updated last year
- This repository implements DSPy programs to tasks in Indian Languages☆11Updated last year
- ☆17Updated 3 weeks ago
- Tools for merging pretrained large language models.☆19Updated 7 months ago
- Demos of some issues with LangChain.☆31Updated last year
- efficient query encoding for dense retrieval☆11Updated 5 months ago