thestephencasper / gpt4_bsLinks

Examples of prompts that cause ChatGPT-4 to hallucinate.

☆31

Alternatives and similar repositories for gpt4_bs

Users that are interested in gpt4_bs are comparing it to the libraries listed below

Sorting:

frankaging / Interchange-Intervention-Training
The codebase for Inducing Causal Structure for Interpretable Neural Networks
☆10Updated 3 years ago
KihoPark / linear_rep_geometry
☆106Updated 7 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆124Updated 7 months ago
ARBORproject / arborproject.github.io
☆81Updated 7 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆237Updated 8 months ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
zouharvi / ryanize-bib
Highlight errors in a bib file: missing URLs, capitalization protection, etc
☆27Updated last year
aadityasingh / icl-dynamics
☆22Updated 5 months ago
mlepori1 / NeuroSurgeon
NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers
☆41Updated 7 months ago
Aaquib111 / edge-attribution-patching
Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"
☆41Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
☆127Updated last year
interpretingdl / eacl2024_transformer_interpretability_tutorial
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆60Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆242Updated last year
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆47Updated 2 weeks ago
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆19Updated 8 months ago
evandez / relations
How do transformer LMs encode relations?
☆53Updated last year
redwoodresearch / Easy-Transformer
☆123Updated last year
bartbussmann / BatchTopK
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
☆48Updated 2 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆54Updated 11 months ago
causalNLP / cladder
We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.
☆126Updated last year
hannamw / EAP-IG
☆51Updated 2 months ago
collin-burns / discovering_latent_knowledge
☆276Updated last year
causalNLP / corr2cause
Data and code for the Corr2Cause paper (ICLR 2024)
☆111Updated last year
KihoPark / LLM_Categorical_Hierarchical_Representations
☆109Updated 7 months ago
ApolloResearch / apd
Attribution-based Parameter Decomposition
☆30Updated 3 months ago
wesg52 / llm-context-neurons
Find context neurons in Pythia models.
☆14Updated 2 years ago
wesg52 / sparse-probing-paper
Sparse probing paper full code.
☆61Updated last year
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆209Updated last week
ApolloResearch / deception-detection
☆18Updated 7 months ago
ejnnr / cupbearer
A library for mechanistic anomaly detection
☆22Updated 8 months ago