openai / automated-interpretability
☆1,004Updated last year
Alternatives and similar repositories for automated-interpretability:
Users that are interested in automated-interpretability are comparing it to the libraries listed below
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,459Updated last month
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,728Updated last year
- Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI☆2,024Updated 9 months ago
- [NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.☆762Updated 6 months ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆483Updated last year
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆1,985Updated last year
- LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions☆821Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,101Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,470Updated last year
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆517Updated 2 months ago
- Dromedary: towards helpful, ethical and reliable LLMs.☆1,141Updated last year
- Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".☆791Updated last year
- [ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…☆935Updated 6 months ago
- Locating and editing factual associations in GPT (NeurIPS 2022)☆627Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆819Updated 8 months ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆806Updated 9 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆486Updated 10 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆546Updated last year
- PaL: Program-Aided Language Models (ICML 2023)☆488Updated last year
- ☆1,028Updated last year
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆721Updated 3 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆2,470Updated 8 months ago
- Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".☆656Updated 7 months ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.