DeqingFu / transformers-icl-second-orderLinks
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models.
☆17Updated 7 months ago
Alternatives and similar repositories for transformers-icl-second-order
Users that are interested in transformers-icl-second-order are comparing it to the libraries listed below
Sorting:
- ☆233Updated last year
- ☆99Updated 5 months ago
- ☆183Updated last year
- ☆43Updated last year
- ☆83Updated last year
- ☆23Updated 5 months ago
- Rewarded soups official implementation☆58Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆26Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆49Updated 9 months ago
- A library for efficient patching and automatic circuit discovery.☆70Updated 2 months ago
- ☆95Updated last year
- official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…☆18Updated 10 months ago
- ☆32Updated 2 years ago
- ☆18Updated 7 months ago
- ☆18Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆112Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆74Updated 4 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆108Updated last year
- Universal Neurons in GPT2 Language Models☆30Updated last year
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆44Updated 6 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆16Updated 6 months ago
- ☆40Updated last month
- ☆14Updated last year
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆34Updated 2 weeks ago
- Pytorch code for experiments on Linear Transformers☆21Updated last year
- ☆121Updated 11 months ago
- ☆28Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆40Updated last year
- Sparse Autoencoder Training Library☆53Updated 2 months ago
- A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability☆21Updated 5 months ago