stanford-crfm / EUAIActJune15
Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act
☆93Updated last year
Alternatives and similar repositories for EUAIActJune15:
Users that are interested in EUAIActJune15 are comparing it to the libraries listed below
- The Foundation Model Transparency Index☆77Updated 10 months ago
- Fiddler Auditor is a tool to evaluate language models.☆177Updated last year
- ☆263Updated 2 months ago
- Command Line Interface for Hugging Face Inference Endpoints☆66Updated 11 months ago
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆172Updated last year
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆214Updated last year
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆107Updated last week
- ☆76Updated 9 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆83Updated this week
- ☆77Updated 2 years ago
- Framework for building and maintaining self-updating prompts for LLMs☆61Updated 9 months ago
- ☆51Updated 10 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated 11 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆37Updated 8 months ago
- ☆78Updated 10 months ago
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.☆23Updated 2 weeks ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- A tool for evaluating LLMs☆408Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆24Updated last year
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Leverage your LangChain trace data for fine tuning☆41Updated 7 months ago
- Make it easy to automatically and uniformly measure the behavior of many AI Systems.☆26Updated 5 months ago
- A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podc…☆68Updated this week
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆109Updated 9 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆102Updated this week
- ReLM is a Regular Expression engine for Language Models☆103Updated last year
- ☆27Updated 4 months ago