safety-research / false-factsLinks
☆24Updated 4 months ago
Alternatives and similar repositories for false-facts
Users that are interested in false-facts are comparing it to the libraries listed below
Sorting:
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆50Updated 11 months ago
- ☆111Updated 9 months ago
- Redwood Research's transformer interpretability tools☆14Updated 3 years ago
- ☆36Updated 2 years ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆129Updated 9 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Updated 7 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 10 months ago
- ☆36Updated 3 years ago
- Measuring the situational awareness of language models☆39Updated last year
- Understanding how features learned by neural networks evolve throughout training☆39Updated last year
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆130Updated 3 years ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated last year
- Sparse and discrete interpretability tool for neural networks☆64Updated last year
- Code for the paper "Fishing for Magikarp"☆174Updated 6 months ago
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆110Updated this week
- Experiments for efforts to train a new and improved t5☆76Updated last year
- ☆143Updated 2 months ago
- ☆104Updated 3 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆59Updated last month
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 7 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆225Updated last week
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Updated 11 months ago
- Extract full next-token probabilities via language model APIs☆247Updated last year
- ☆62Updated last month
- Open source replication of Anthropic's Crosscoders for Model Diffing☆60Updated last year
- Mechanistic Interpretability for Transformer Models☆53Updated 3 years ago
- Utilities for the HuggingFace transformers library☆71Updated 2 years ago
- Sparse Autoencoder Training Library☆55Updated 6 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆72Updated 7 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year