wesg52 / world-models
Extracting spatial and temporal world models from LLMs
☆243Updated last year
Related projects ⓘ
Alternatives and complementary repositories for world-models
- ☆246Updated 4 months ago
- ☆186Updated last month
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆157Updated last month
- ☆320Updated 3 months ago
- ☆198Updated last year
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆163Updated this week
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆168Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆719Updated 2 months ago
- Training Sparse Autoencoders on Language Models☆449Updated this week
- An extensible benchmark for evaluating large language models on planning☆288Updated 5 months ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆463Updated last month
- Inspecting and Editing Knowledge Representations in Language Models☆107Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆160Updated last month
- This repository collects all relevant resources about interpretability in LLMs☆282Updated last week
- ☆96Updated 3 months ago
- RewardBench: the first evaluation tool for reward models.☆424Updated 2 weeks ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆98Updated 5 months ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆434Updated 9 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆210Updated last month
- Mechanistic Interpretability Visualizations using React☆195Updated 3 months ago
- Extract full next-token probabilities via language model APIs☆228Updated 8 months ago
- ☆102Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆181Updated 11 months ago
- Scaling Data-Constrained Language Models☆321Updated last month
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆200Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆233Updated 6 months ago
- GPT4 based personalized ArXiv paper assistant bot☆486Updated 7 months ago
- A puzzle to learn about prompting☆119Updated last year
- ☆99Updated this week