alan-turing-institute / robots-in-disguiseLinks
Information and materials for the Turing's "robots-in-disguise" reading group on fundamental AI research.
☆33Updated 4 months ago
Alternatives and similar repositories for robots-in-disguise
Users that are interested in robots-in-disguise are comparing it to the libraries listed below
Sorting:
- we got you bro☆36Updated last year
- MetaQuantus is an XAI performance tool to identify reliable evaluation metrics☆37Updated last year
- 🧠 Starter templates for doing interpretability research☆73Updated 2 years ago
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆292Updated 2 weeks ago
- OpenXAI : Towards a Transparent Evaluation of Model Explanations☆247Updated 11 months ago
- Causal Responsibility EXplanations for Image Classifiers and Tabular Data☆35Updated last week
- ☆73Updated 2 years ago
- List of ML conferences with important dates and accepted paper list☆138Updated 3 months ago
- Deep Learning, an Energy Approach☆199Updated 2 months ago
- Fairness toolkit for pytorch, scikit learn and autogluon☆32Updated 8 months ago
- PyTorch code corresponding to my blog series on adversarial examples and (confidence-calibrated) adversarial training.☆68Updated 2 years ago
- ☆81Updated 5 months ago
- Tools for studying developmental interpretability in neural networks.☆100Updated last month
- ☆80Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆368Updated 9 months ago
- ☆66Updated 2 years ago
- LENS Project☆48Updated last year
- The boundary of neural network trainability is fractal☆215Updated last year
- Official repository for CMU Machine Learning Department's 10721: "Philosophical Foundations of Machine Intelligence".☆262Updated 2 years ago
- 👋 Overcomplete is a Vision-based SAE Toolbox☆72Updated last week
- Uncertainty quantification with PyTorch☆367Updated 3 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆512Updated last year
- Mechanistic Interpretability Visualizations using React☆273Updated 7 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆257Updated last year
- PyTorch-centric library for evaluating and enhancing the robustness of AI technologies☆57Updated last year
- ☆31Updated 8 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆128Updated 2 years ago
- Starting kit for the NeurIPS 2023 unlearning challenge☆378Updated last year
- Materials of the Nordic Probabilistic AI School 2023.☆90Updated last year
- ☆326Updated 3 weeks ago