wesg52 / world-modelsLinks
Extracting spatial and temporal world models from LLMs
☆255Updated last year
Alternatives and similar repositories for world-models
Users that are interested in world-models are comparing it to the libraries listed below
Sorting:
- Tools for understanding how transformer predictions are built layer-by-layer☆497Updated last year
- Simple next-token-prediction for RLHF☆227Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆832Updated 9 months ago
- ☆269Updated last year
- ☆237Updated 2 years ago
- RuLES: a benchmark for evaluating rule-following in language models☆224Updated 3 months ago
- ☆292Updated 11 months ago
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆346Updated last year
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆494Updated last year
- ☆207Updated last year
- Scaling Data-Constrained Language Models☆334Updated 8 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆181Updated last year
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆527Updated 4 months ago
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467☆286Updated 3 months ago
- Reasoning with Language Model is Planning with World Model☆167Updated last year
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆223Updated this week
- RewardBench: the first evaluation tool for reward models.☆590Updated this week
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆207Updated 2 years ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆193Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆201Updated 5 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆129Updated 3 months ago
- This repository collects all relevant resources about interpretability in LLMs☆353Updated 7 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆169Updated last month
- ☆117Updated 10 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆493Updated 4 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆273Updated last year
- ☆223Updated 8 months ago
- ☆484Updated 10 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆116Updated last year
- Meta-Learning for Compositionality (MLC) for modeling human behavior☆141Updated last year