mlcommons / modelgauge
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
☆27Updated 6 months ago
Alternatives and similar repositories for modelgauge:
Users that are interested in modelgauge are comparing it to the libraries listed below
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆88Updated this week
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Sc…☆29Updated last year
- Understanding how features learned by neural networks evolve throughout training☆34Updated 6 months ago
- ☆29Updated last year
- BenchBench is a Python package to evaluate multi-task benchmarks.☆15Updated 9 months ago
- ModelDiff: A Framework for Comparing Learning Algorithms☆56Updated last year
- ☆14Updated last year
- ☆12Updated 3 years ago
- Sparse and discrete interpretability tool for neural networks☆62Updated last year
- ☆49Updated last year
- ☆34Updated last year
- The Foundation Model Transparency Index☆78Updated 11 months ago
- ☆42Updated last year
- Code for Fooling Contrastive Language-Image Pre-trainined Models with CLIPMasterPrints☆15Updated 6 months ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- Official Repository for Dataset Inference for LLMs☆33Updated 9 months ago
- This repo contains code for the paper: "Can Foundation Models Help Us Achieve Perfect Secrecy?"☆24Updated 2 years ago
- ☆28Updated last year
- ☆26Updated last year
- ☆23Updated 2 months ago
- Libraries for efficient and scalable group-structured dataset pipelines.☆25Updated 4 months ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆42Updated last month
- ☆17Updated 2 years ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Few-shot Learning with Auxiliary Data☆27Updated last year
- Experiments to assess SPADE on different LLM pipelines.☆16Updated last year
- Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents☆49Updated 7 months ago
- Testing Language Models for Memorization of Tabular Datasets.☆33Updated 2 months ago
- In-context Example Selection with Influences☆15Updated last year