apartresearch / readingwhatwecanLinks

📚📚📚📚📚📚📚📚📚 Reading everything

☆15

Alternatives and similar repositories for readingwhatwecan

Users that are interested in readingwhatwecan are comparing it to the libraries listed below

Sorting:

EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆212Updated last week
quantified-uncertainty / ai-safety-papers
☆22Updated 4 years ago
anthropics / evals
☆307Updated last year
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆78Updated 2 years ago
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆198Updated 3 years ago
collin-burns / discovering_latent_knowledge
☆281Updated last year
METR / public-tasks
☆106Updated last week
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆238Updated 9 months ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆191Updated 2 years ago
alignedai / HappyFaces
The Happy Faces Benchmark
☆15Updated 2 years ago
METR / task-standard
METR Task Standard
☆167Updated 9 months ago
aypan17 / machiavelli
☆139Updated 3 months ago
jessicarumbelow / Backwards
☆84Updated last year
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆215Updated 5 months ago
minalee-research / coauthor-interface
☆100Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆71Updated 2 years ago
StampyAI / stampy-chat
Conversational chatbot to answer questions about AI Safety & Alignment based on information retrieved from the Alignment Research Dataset
☆15Updated last month
StampyAI / stampy-ui
AI Safety Q&A web frontend
☆41Updated this week
EleutherAI / steering-llama3
☆30Updated last year
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆128Updated this week
redwoodresearch / rust_circuit_public
☆65Updated 2 years ago
benlipkin / probsem
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆72Updated last year
Mech-Interp / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆14Updated last year
JasonGross / guarantees-based-mechanistic-interpretability
☆17Updated last week
callummcdougall / sae_visualizer
☆29Updated last year
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆301Updated 11 months ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆114Updated 4 months ago
EleutherAI / project-menu
See the issue board for the current status of active and prospective projects!
☆65Updated 3 years ago