Wenyueh / inductive_reasoning_benchmarkLinks
inductive reasoning benchmark with subregular hierarchy for string-to-string transformation
β14Updated 7 months ago
Alternatives and similar repositories for inductive_reasoning_benchmark
Users that are interested in inductive_reasoning_benchmark are comparing it to the libraries listed below
Sorting:
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ163Updated 7 months ago
- AI Logging for Interpretability and Explainabilityπ¬β140Updated last year
- Algebraic value editing in pretrained language modelsβ67Updated 2 years ago
- [ICLR 2025] General-purpose activation steering libraryβ141Updated 4 months ago
- β51Updated 2 years ago
- Forcing Diffuse Distributions out of Language Modelsβ18Updated last year
- β104Updated 2 years ago
- A library for efficient patching and automatic circuit discovery.β88Updated last month
- Steering Llama 2 with Contrastive Activation Additionβ207Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.β109Updated 2 weeks ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spacesβ100Updated 2 years ago
- β99Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"β124Updated last year
- Function Vectors in Large Language Models (ICLR 2024)β191Updated 9 months ago
- β203Updated 9 months ago
- β206Updated 3 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β85Updated 11 months ago
- β85Updated last year
- [NeurIPS'24 Spotlight] Observational Scaling Lawsβ58Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β241Updated 2 weeks ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".β80Updated last year
- Test-time-training on nearest neighbors for large language modelsβ49Updated last year
- Improving Alignment and Robustness with Circuit Breakersβ258Updated last year
- β197Updated last year
- β247Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ124Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated last year
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,β¦β55Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityβ229Updated last year
- A resource repository for representation engineering in large language modelsβ148Updated last year