safety-research / false-factsLinks

☆24

Alternatives and similar repositories for false-facts

Users that are interested in false-facts are comparing it to the libraries listed below

Sorting:

aryamanarora / causalgym
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆50Updated 11 months ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆111Updated 9 months ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
jmerullo / lm_vector_arithmetic
☆36Updated 2 years ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆129Updated 9 months ago
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Updated 7 months ago
UKPLab / on-emergence
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Updated 10 months ago
shauli-ravfogel / rlace-icml
☆36Updated 3 years ago
AsaCooperStickland / situational-awareness-evals
Measuring the situational awareness of language models
☆39Updated last year
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆39Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
cohere-ai / magikarp
Code for the paper "Fishing for Magikarp"
☆174Updated 6 months ago
mega002 / llm-interp-tau
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆110Updated this week
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
google-deepmind / mishax
☆143Updated 2 months ago
MinhxLe / subliminal-learning
☆104Updated 3 months ago
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆59Updated last month
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 7 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆225Updated last week
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated 11 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
google-deepmind / dangerous-capability-evaluations
☆62Updated last month
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆60Updated last year
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆71Updated 2 years ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 6 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year