lukasberglund / reversal_curseLinks
☆289Updated last year
Alternatives and similar repositories for reversal_curse
Users that are interested in reversal_curse are comparing it to the libraries listed below
Sorting:
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆505Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆228Updated 8 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆161Updated last year
- Simple next-token-prediction for RLHF☆228Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆120Updated 7 months ago
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467☆289Updated 4 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆244Updated last year
- ☆238Updated 2 years ago
- Evaluating LLMs with fewer examples☆160Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆180Updated 5 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆227Updated 4 months ago
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆157Updated last year
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆206Updated 2 years ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆535Updated 5 months ago
- ☆83Updated 5 months ago
- ☆135Updated 8 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆81Updated last year
- ☆150Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆221Updated last year
- ☆310Updated last year
- Scaling Data-Constrained Language Models☆337Updated 2 weeks ago
- DSIR large-scale data selection framework for language model training☆252Updated last year
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆203Updated 2 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆204Updated 3 weeks ago
- ☆133Updated last year
- A simple unified framework for evaluating LLMs☆221Updated 2 months ago
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆350Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 8 months ago
- ☆182Updated 2 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆220Updated 7 months ago