stanford-crfm / EUAIActJune15
Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act
☆92Updated last year
Related projects ⓘ
Alternatives and complementary repositories for EUAIActJune15
- Fiddler Auditor is a tool to evaluate language models.☆171Updated 8 months ago
- Leverage your LangChain trace data for fine tuning☆38Updated 3 months ago
- ☆75Updated 5 months ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning☆41Updated 11 months ago
- Framework for building and maintaining self-updating prompts for LLMs☆59Updated 5 months ago
- AI Data Management & Evaluation Platform☆215Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Command Line Interface for Hugging Face Inference Endpoints☆66Updated 7 months ago
- The Foundation Model Transparency Index☆71Updated 5 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated last year
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆33Updated 3 months ago
- Automatic Evals for Instruction-Tuned Models☆45Updated this week
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆36Updated 7 months ago
- Sample notebooks and prompts for LLM evaluation☆114Updated this week
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆66Updated this week
- ☆24Updated last year
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- An open-source compliance-centered evaluation framework for Generative AI models☆104Updated last week
- ☆258Updated this week
- 📖 A curated list of resources dedicated to synthetic data☆118Updated 2 years ago
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆161Updated last year
- ☆129Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- ☆47Updated 5 months ago
- ☆44Updated 5 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆100Updated 2 months ago
- Let's build better datasets, together!☆205Updated this week
- ReLM is a Regular Expression engine for Language Models☆104Updated last year
- This is an open-source tool to assess and improve the trustworthiness of AI systems.☆79Updated this week
- ☆75Updated 5 months ago