stanford-crfm / EUAIActJune15
Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act
☆94Updated last year
Alternatives and similar repositories for EUAIActJune15
Users that are interested in EUAIActJune15 are comparing it to the libraries listed below
Sorting:
- Fiddler Auditor is a tool to evaluate language models.☆179Updated last year
- The Foundation Model Transparency Index☆79Updated 11 months ago
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆179Updated last month
- An open-source compliance-centered evaluation framework for Generative AI models☆148Updated last week
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆91Updated this week
- Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central …☆47Updated 11 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.☆23Updated 2 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- Framework for building and maintaining self-updating prompts for LLMs☆62Updated 11 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 10 months ago
- Command Line Interface for Hugging Face Inference Endpoints☆66Updated last year
- ☆47Updated 11 months ago
- Make it easy to automatically and uniformly measure the behavior of many AI Systems.☆27Updated 7 months ago
- ☆24Updated last year
- ☆267Updated 3 months ago
- ☆77Updated 11 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- ☆78Updated 2 years ago
- ☆52Updated 11 months ago
- ☆78Updated 11 months ago
- ☆233Updated last month
- Your buddy in the (L)LM space.☆64Updated 7 months ago
- ReLM is a Regular Expression engine for Language Models☆104Updated last year
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆41Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Collection of evals for Inspect AI☆132Updated this week
- AI Verify☆8Updated this week
- Red-Teaming Language Models with DSPy☆192Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 7 months ago