surajsrivats / the-most-starred-repo-ever

☆17

Related projects: ⓘ

samacqua / LARC
Language-annotated Abstraction and Reasoning Corpus
☆76Updated last year
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆48Updated 2 years ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆62Updated 2 years ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆11Updated 2 years ago
jessicarumbelow / Backwards
☆72Updated 2 months ago
yashbonde / rasp
Implementing RASP transformer programming language https://arxiv.org/pdf/2106.06981.pdf.
☆42Updated 3 years ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆55Updated last year
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆190Updated 3 months ago
UlisseMini / procgen-tools
Tools for running experiments on RL agents in procgen environments
☆16Updated 5 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆226Updated 6 months ago
LRudL / evalugator
(Model-written) LLM evals library
☆14Updated last month
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆88Updated 2 years ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆175Updated 2 months ago
moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆68Updated last year
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆165Updated last year
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆185Updated 7 months ago
google-deepmind / neural_networks_chomsky_hierarchy
Neural Networks and the Chomsky Hierarchy
☆183Updated 5 months ago
neoneye / arc-notes
My writings about ARC (Abstraction and Reasoning Corpus)
☆55Updated 3 weeks ago
srush / torch-golf
Silly twitter torch implementations.
☆46Updated last year
JacobPfau / procgenAISC
☆18Updated last year
EleutherAI / project-menu
See the issue board for the current status of active and prospective projects!
☆65Updated 2 years ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆182Updated this week
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆69Updated this week
jacobdunefsky / transcoder_circuits
☆33Updated 3 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆174Updated 4 months ago
redwoodresearch / rust_circuit_public
☆59Updated last year
thestephencasper / everything-you-need
we got you bro
☆32Updated last month
aogara-ds / hoodwinked-website
A text-based game where language models learn to lie and to detect lies.
☆11Updated 11 months ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆66Updated last month
aidangomez / weblm
Drive a browser with Cohere
☆72Updated last year