quantified-uncertainty / ai-safety-papersLinks

☆21

Alternatives and similar repositories for ai-safety-papers

Users that are interested in ai-safety-papers are comparing it to the libraries listed below

Sorting:

moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆77Updated 2 years ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆209Updated last week
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆194Updated 3 years ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆51Updated 3 years ago
alignedai / HappyFaces
The Happy Faces Benchmark
☆15Updated 2 years ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆100Updated last month
apartresearch / readingwhatwecan
📚📚📚📚📚📚📚📚📚 Reading everything
☆14Updated 3 months ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆73Updated 2 years ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆128Updated 2 years ago
socketteer / worldspider
gpt completions in vscode
☆34Updated 2 years ago
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
METR / task-template
☆9Updated last year
volotat / ARC-Game
The Abstraction and Reasoning Corpus made into a web game
☆90Updated 11 months ago
redwoodresearch / rust_circuit_public
☆63Updated 2 years ago
samacqua / LARC
Language-annotated Abstraction and Reasoning Corpus
☆90Updated 2 years ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆76Updated 3 years ago
JacobPfau / procgenAISC
☆19Updated 2 years ago
JasonGross / guarantees-based-mechanistic-interpretability
☆16Updated this week
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆186Updated 2 years ago
anthropics / evals
☆287Updated last year
keyonvafa / world-model-evaluation
☆62Updated 8 months ago
irregular-rhomboid / EAI-Math-Reading-Group
Resources from the EleutherAI Math Reading Group
☆53Updated 5 months ago
RewardReports / reward-reports
Documentation for dynamic machine learning systems.
☆29Updated 10 months ago
jbloomAus / DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
☆87Updated last year
google-deepmind / neural_networks_chomsky_hierarchy
Neural Networks and the Chomsky Hierarchy
☆207Updated last year
apple / ml-np-rasp
☆19Updated last year
DAIOS-AI / mindscript
A programming language for formal/informal computation.
☆41Updated last week
Kiv / fancy_einsum
Einsum with einops style variable names
☆17Updated last year
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆208Updated 2 months ago