wesg52 / world-models
Extracting spatial and temporal world models from LLMs
☆254Updated last year
Alternatives and similar repositories for world-models:
Users that are interested in world-models are comparing it to the libraries listed below
- Tools for understanding how transformer predictions are built layer-by-layer☆485Updated 10 months ago
- ☆286Updated 10 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆180Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆818Updated 8 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆194Updated 4 months ago
- This repository collects all relevant resources about interpretability in LLMs☆339Updated 5 months ago
- Mechanistic Interpretability Visualizations using React☆241Updated 4 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆115Updated last year
- GPT4 based personalized ArXiv paper assistant bot☆516Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆233Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆206Updated last year
- ☆114Updated 8 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆114Updated 11 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆303Updated 5 months ago
- ☆217Updated 6 months ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆517Updated 2 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆480Updated 3 months ago
- Meta-Learning for Compositionality (MLC) for modeling human behavior☆141Updated last year
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆347Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆264Updated 10 months ago
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate☆425Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆74Updated last year
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆203Updated 2 years ago
- ☆264Updated last year
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆208Updated this week
- Recurrent Memory Transformer☆149Updated last year
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆196Updated last year
- ☆126Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆553Updated last month
- Scaling Data-Constrained Language Models☆334Updated 7 months ago