openai / automated-interpretabilityLinks

☆1,055

Alternatives and similar repositories for automated-interpretability

Users that are interested in automated-interpretability are comparing it to the libraries listed below

Sorting:

anthropics / hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,799Updated 5 months ago
openai / prm800k
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,073Updated 2 years ago
tatsu-lab / alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
☆835Updated last year
reasoning-machines / pal
PaL: Program-Aided Language Models (ICML 2023)
☆516Updated 2 years ago
andyzoujm / representation-engineering
Representation Engineering: A Top-Down Approach to AI Transparency
☆915Updated last year
IBM / Dromedary
Dromedary: towards helpful, ethical and reliable LLMs.
☆1,143Updated 2 months ago
likenneth / honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
☆556Updated 9 months ago
kmeng01 / memit
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
☆530Updated last year
yuchenlin / LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…
☆970Updated last year
inverse-scaling / prize
A prize for finding tasks that cause large language models to show inverse scaling
☆617Updated 2 years ago
kmeng01 / rome
Locating and editing factual associations in GPT (NeurIPS 2022)
☆693Updated last year
tatsu-lab / alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆1,906Updated 3 months ago
openai / sparse_autoencoder
☆543Updated last year
lucidrains / toolformer-pytorch
Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI
☆2,055Updated last year
google-research / FLAN
☆1,552Updated 3 weeks ago
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,063Updated last year
HazyResearch / ama_prompting
Ask Me Anything language model prompting
☆546Updated 2 years ago
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆546Updated 3 months ago
google-deepmind / long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆652Updated 3 months ago
princeton-nlp / MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,134Updated last year
conceptofmind / PaLM
An open-source implementation of Google's PaLM models
☆818Updated last year
EleutherAI / pythia
The hub for EleutherAI's work on interpretability and learning dynamics
☆2,676Updated last week
hendrycks / test
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,518Updated 2 years ago
suzgunmirac / BIG-Bench-Hard
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆529Updated last year
sylinrl / TruthfulQA
TruthfulQA: Measuring How Models Imitate Human Falsehoods
☆840Updated 10 months ago
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆654Updated 10 months ago
SinclairCoder / Instruction-Tuning-Papers
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
☆768Updated 2 years ago
madaan / self-refine
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
☆757Updated last year
allenai / natural-instructions
Expanding natural instructions
☆1,025Updated last year
keirp / automatic_prompt_engineer
☆1,324Updated last year