thestephencasper / gpt4_bsLinks
Examples of prompts that cause ChatGPT-4 to hallucinate.
☆31Updated 2 years ago
Alternatives and similar repositories for gpt4_bs
Users that are interested in gpt4_bs are comparing it to the libraries listed below
Sorting:
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆133Updated last year
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆42Updated 10 months ago
- ☆259Updated last year
- A mechanistic approach for understanding and detecting factual errors of large language models.☆49Updated last year
- TalkToModel gives anyone with the powers of XAI through natural language conversations 💬!☆125Updated 2 years ago
- ☆283Updated last year
- ☆57Updated 2 years ago
- Erasing concepts from neural representations with provable guarantees☆239Updated 10 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆61Updated last year
- ☆111Updated 10 months ago
- ☆83Updated 9 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆57Updated last month
- ☆22Updated 3 months ago
- ☆65Updated 4 months ago
- A library for efficient patching and automatic circuit discovery.☆80Updated 4 months ago
- ☆132Updated 2 years ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆132Updated 9 months ago
- we got you bro☆36Updated last year
- ☆133Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28Updated last year
- Aligning AI With Shared Human Values (ICLR 2021)☆305Updated 2 years ago
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆21Updated 4 months ago
- ☆116Updated last year
- PAIR.withgoogle.com and friend's work on interpretability methods☆215Updated 2 weeks ago
- ☆95Updated last year
- Highlight errors in a bib file: missing URLs, capitalization protection, etc☆27Updated last year
- The Prism Alignment Project☆86Updated last year
- ☆27Updated 2 years ago
- ☆56Updated 2 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆215Updated last week