aws-samples / evaluating-large-language-models-using-llm-as-a-judge
☆12Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for evaluating-large-language-models-using-llm-as-a-judge
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- ☆11Updated 3 weeks ago
- ☆20Updated 10 months ago
- Tool to take your ML model from local to production with one-line of code.☆23Updated 9 months ago
- AI_Powered_Dev_Search_Engine☆12Updated 8 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆28Updated last month
- Creating Generative AI Apps which work☆16Updated 4 months ago
- Efficiently computing & storing token n-grams from large corpora☆15Updated last month
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆49Updated last week
- Public reports detailing responses to sets of prompts by Large Language Models.☆25Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 5 months ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 3 months ago
- ☆40Updated last week
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 4 months ago
- ☆41Updated last month
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆36Updated 7 months ago
- ☆24Updated last year
- ☆12Updated last week
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆21Updated last month
- ☆15Updated 3 weeks ago
- ☆48Updated last week
- Build Agentic workflows with function calling☆20Updated last week
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆30Updated last year
- Tools for merging pretrained large language models.☆19Updated 5 months ago
- Fullstack chatbot application☆11Updated 3 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆23Updated 10 months ago