stanford-crfm / EUAIActJune15
Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act
☆92Updated last year
Alternatives and similar repositories for EUAIActJune15:
Users that are interested in EUAIActJune15 are comparing it to the libraries listed below
- Fiddler Auditor is a tool to evaluate language models.☆174Updated 10 months ago
- Sample notebooks and prompts for LLM evaluation☆119Updated last month
- The Foundation Model Transparency Index☆73Updated 7 months ago
- ☆258Updated this week
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆213Updated last year
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆166Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆43Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆82Updated this week
- Command Line Interface for Hugging Face Inference Endpoints☆67Updated 9 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆73Updated this week
- 📖 A curated list of resources dedicated to synthetic data☆123Updated 2 years ago
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated last month
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆63Updated 10 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 9 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated 3 weeks ago
- Automatic Evals for Instruction-Tuned Models☆100Updated this week
- ReLM is a Regular Expression engine for Language Models☆103Updated last year
- Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central …☆47Updated 7 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆105Updated this week
- Red-Teaming Language Models with DSPy☆153Updated 9 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆101Updated 7 months ago
- Framework for building and maintaining self-updating prompts for LLMs☆59Updated 7 months ago
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆313Updated 2 months ago
- A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podc…☆61Updated this week
- ☆24Updated last year
- Make it easy to automatically and uniformly measure the behavior of many AI Systems.☆26Updated 3 months ago
- ☆76Updated 7 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆35Updated 5 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 6 months ago