mlcommons / modelgaugeLinks
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
☆26Updated 8 months ago
Alternatives and similar repositories for modelgauge
Users that are interested in modelgauge are comparing it to the libraries listed below
Sorting:
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆92Updated this week
- ☆28Updated 2 years ago
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act☆94Updated last year
- Official code for the paper: "Metadata Archaeology"☆19Updated 2 years ago
- Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents☆49Updated 8 months ago
- ☆29Updated last year
- ☆12Updated 3 years ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆46Updated 11 months ago
- ☆26Updated 2 years ago
- Official repository for the paper "Zero-Shot AutoML with Pretrained Models"☆47Updated last year
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Sc…☆30Updated last year
- Code for Fooling Contrastive Language-Image Pre-trainined Models with CLIPMasterPrints☆15Updated 7 months ago
- Understanding how features learned by neural networks evolve throughout training☆34Updated 7 months ago
- 🧠 Starter templates for doing interpretability research☆69Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆27Updated last year
- ☆17Updated 2 years ago
- ☆13Updated last week
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆21Updated last year
- ☆31Updated last year
- Official Repository for Dataset Inference for LLMs☆34Updated 10 months ago
- Testing Language Models for Memorization of Tabular Datasets.☆33Updated 3 months ago
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- ☆60Updated 3 years ago
- ☆14Updated last year
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- BenchBench is a Python package to evaluate multi-task benchmarks.☆15Updated 10 months ago
- DAM Data Acquisition for ML Benchmark, as part of the DataPerf benchmark suite, https://dataperf.org/☆24Updated 2 years ago
- Evaluation of neuro-symbolic engines☆35Updated 10 months ago
- ☆44Updated 6 months ago