wesg52 / world-modelsLinks

Extracting spatial and temporal world models from LLMs

☆255

Alternatives and similar repositories for world-models

Users that are interested in world-models are comparing it to the libraries listed below

Sorting:

AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆500Updated last year
jayelm / gisting
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
☆289Updated 4 months ago
csinva / interpretable-embeddings
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
☆37Updated 7 months ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆181Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆202Updated 6 months ago
redwoodresearch / Easy-Transformer
☆120Updated 10 months ago
da03 / implicit_chain_of_thought
☆132Updated 7 months ago
lukasberglund / reversal_curse
☆288Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆170Updated 2 months ago
FranxYao / GPT-Bargaining
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆207Updated 2 years ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆317Updated 7 months ago
collin-burns / discovering_latent_knowledge
☆270Updated last year
huggingface / datablations
Scaling Data-Constrained Language Models
☆335Updated 9 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆184Updated this week
KihoPark / LLM_Categorical_Hierarchical_Representations
☆99Updated 4 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆219Updated 6 months ago
gabegrand / world-models
☆207Updated last year
kmeng01 / memit
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
☆500Updated last year
snap-stanford / MLAgentBench
☆294Updated last year
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆359Updated 7 months ago
KihoPark / linear_rep_geometry
☆95Updated 4 months ago
vinid / NegotiationArena
☆69Updated last year
mechanistic-interpretability-grokking / progress-measures-paper
☆67Updated 2 years ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated last year
sotopia-lab / sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆224Updated this week
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago
openai / sparse_autoencoder
☆495Updated 11 months ago
GFNOrg / gfn-lm-tuning
☆180Updated last year
benpry / why-think-step-by-step
Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"
☆60Updated 2 months ago