dan-gittik / infosec17Links

☆1

Alternatives and similar repositories for infosec17

Users that are interested in infosec17 are comparing it to the libraries listed below

Sorting:

google-deepmind / tracr
☆540Updated last year
curt-tigges / probity
☆15Updated 3 months ago
Kiv / fancy_einsum
Einsum with einops style variable names
☆16Updated last year
collin-burns / discovering_latent_knowledge
☆273Updated last year
carlini / pycallcc
Discount jupyter.
☆51Updated 4 months ago
samuela / git-re-basin
Code release for "Git Re-Basin: Merging Models modulo Permutation Symmetries"
☆484Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
☆231Updated 9 months ago
aengusl / latent-adversarial-training
☆40Updated 9 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆262Updated 7 months ago
centerforaisafety / Intro_to_ML_Safety
☆72Updated 2 years ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆505Updated last year
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 7 months ago
mmazeika / tdc-starter-kit
Starter kit and data loading code for the Trojan Detection Challenge NeurIPS 2022 competition
☆33Updated last year
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆99Updated 3 weeks ago
EffiSciencesResearch / ML4G
Machine Learning for Alignment Bootcamp
☆25Updated last year
Butanium / tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
☆12Updated 2 weeks ago
openai / automated-interpretability
☆1,025Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆259Updated last year
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆608Updated last week
neelnanda-io / 1L-Sparse-Autoencoder
☆123Updated last year
jessicarumbelow / Backwards
☆82Updated last year
inverse-scaling / prize
A prize for finding tasks that cause large language models to show inverse scaling
☆613Updated last year
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆217Updated last year
centerforaisafety / tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆90Updated last year
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆879Updated this week
jim-berend / semanticlens
☆13Updated 2 months ago
HazyResearch / safari
Convolutions for Sequence Modeling
☆893Updated last year
HazyResearch / H3
Language Modeling with the H3 State Space Model
☆520Updated last year
callummcdougall / ARENA_3.0
☆617Updated last week
adamkarvonen / SAEBench
☆107Updated this week