ai-safety-graph / AISafetyGraphLinks
AI Safety Graph
☆16Updated 4 months ago
Alternatives and similar repositories for AISafetyGraph
Users that are interested in AISafetyGraph are comparing it to the libraries listed below
Sorting:
- Inspect: A framework for large language model evaluations☆1,225Updated last week
- Benchmarks for the Evaluation of LLM Supervision☆32Updated last month
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆82Updated this week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆550Updated last year
- METR Task Standard☆157Updated 6 months ago
- ☆289Updated last year
- ☆658Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆624Updated this week
- A library for generative social simulation☆984Updated this week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆788Updated last week
- Representation Engineering: A Top-Down Approach to AI Transparency☆855Updated last year
- Mechanistic Interpretability Visualizations using React☆273Updated 7 months ago
- Collection of evals for Inspect AI☆205Updated this week
- ☆66Updated 6 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆221Updated last year
- ☆324Updated this week
- Interpretability for sequence generation models 🐛 🔍☆432Updated 3 months ago
- An open science effort to benchmark legal reasoning in foundation models☆463Updated 11 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆513Updated last week
- Croissant is a high-level format for machine learning datasets that brings together four rich layers.☆682Updated last week
- Aligning AI With Shared Human Values (ICLR 2021)☆290Updated 2 years ago
- Machine Learning for Alignment Bootcamp☆77Updated 3 years ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆325Updated last year
- utilities for decoding deep representations (like sentence embeddings) back to text☆917Updated last week
- (Model-written) LLM evals library☆18Updated 8 months ago
- Training Sparse Autoencoders on Language Models☆910Updated this week
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆790Updated 6 months ago
- A library for making RepE control vectors☆624Updated 7 months ago
- ☆275Updated last year
- ☆234Updated 10 months ago