dvruette / concept-guidanceLinks

Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vectors that control the behavior of LLMs at inference time.

☆21

Alternatives and similar repositories for concept-guidance

Users that are interested in concept-guidance are comparing it to the libraries listed below

Sorting:

ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 2 months ago
Zyphra / Zyda_processing
☆35Updated last year
felixbinder / introspection_self_prediction
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆15Updated 7 months ago
microsoft / tale-suite
Text Adventure Learning Environment Suite - Benchmark to evaluate language models on interactive text environments.
☆18Updated last month
apple / ml-planner
☆53Updated last year
tyler-romero / microR1
Simple repository for training small reasoning models
☆33Updated 5 months ago
samuelarnesen / nyu-debate-modeling
☆22Updated 9 months ago
schauppi / Self-Rewarding-Language-Models
☆46Updated last year
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 5 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆46Updated 3 months ago
brantondemoss / GrokkingComplexity
Code for
☆27Updated 7 months ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated last year
amudide / switch_sae
Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)
☆25Updated 7 months ago
doomslide / autoloom
Approximating the joint distribution of language models via MCTS
☆21Updated 8 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 10 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆39Updated 8 months ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 7 months ago
joshuacnf / Ctrl-G
☆86Updated 6 months ago
xjdr-alt / muzero_sketch
☆38Updated 11 months ago
YuchenJin / llm.c
LLM training in simple, raw C/CUDA
☆15Updated 7 months ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
NohTow / PPL-MCTS
Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22
☆66Updated 2 years ago
arcee-ai / DAM
☆52Updated 8 months ago
kilian-group / phantom-wiki
Python package for generating datasets to evaluate reasoning and retrieval of large language models
☆18Updated 2 weeks ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
austrian-code-wizard / c3po
☆27Updated 2 weeks ago
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆18Updated 5 months ago
CarperAI / treasure_trove
☆22Updated last year