mlcommons / modelgaugeLinks
Make it easy to automatically and uniformly measure the behavior of many AI Systems.
☆26Updated 8 months ago
Alternatives and similar repositories for modelgauge
Users that are interested in modelgauge are comparing it to the libraries listed below
Sorting:
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆94Updated this week
- Understanding how features learned by neural networks evolve throughout training☆36Updated 8 months ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- ☆54Updated 2 years ago
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act☆94Updated last year
- ☆12Updated 3 years ago
- ModelDiff: A Framework for Comparing Learning Algorithms☆57Updated last year
- ☆29Updated 2 years ago
- BenchBench is a Python package to evaluate multi-task benchmarks.☆15Updated 11 months ago
- ☆55Updated 9 months ago
- ☆36Updated 2 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆27Updated last year
- git extension for {collaborative, communal, continual} model development☆213Updated 7 months ago
- ☆29Updated 2 years ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- ☆74Updated last year
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆21Updated last year
- ☆134Updated 2 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆26Updated 2 years ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Sc…☆30Updated last year
- Official code for the paper: "Metadata Archaeology"☆19Updated 2 years ago
- Repository of machine learning benchmarks☆36Updated 3 weeks ago
- Ludwig benchmark☆20Updated 3 years ago
- Libraries for efficient and scalable group-structured dataset pipelines.☆26Updated last week
- 🧠 Starter templates for doing interpretability research☆71Updated last year
- Code for Fooling Contrastive Language-Image Pre-trainined Models with CLIPMasterPrints☆15Updated 8 months ago
- ☆28Updated last week
- ☆99Updated 4 months ago