haizelabs / bijection-learningLinks
☆26Updated 11 months ago
Alternatives and similar repositories for bijection-learning
Users that are interested in bijection-learning are comparing it to the libraries listed below
Sorting:
- Sphynx Hallucination Induction☆53Updated 8 months ago
- Red-Teaming Language Models with DSPy☆216Updated 8 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆96Updated 5 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆51Updated 6 months ago
- Inference-time scaling for LLMs-as-a-judge.☆300Updated last week
- ☆109Updated 5 months ago
- Open source interpretability artefacts for R1.☆161Updated 5 months ago
- ☆59Updated 2 weeks ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆55Updated 7 months ago
- ☆34Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- look how they massacred my boy☆63Updated 11 months ago
- ☆102Updated 9 months ago
- ☆142Updated last month
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆115Updated this week
- smolLM with Entropix sampler on pytorch☆150Updated 11 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆56Updated 7 months ago
- ☆216Updated 7 months ago
- ☆135Updated 6 months ago
- Plotting (entropy, varentropy) for small LMs☆98Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆118Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆118Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 7 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆77Updated 6 months ago
- Train your own SOTA deductive reasoning model☆107Updated 7 months ago
- ☆136Updated this week
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆98Updated 2 months ago
- ⚖️ Awesome LLM Judges ⚖️☆130Updated 5 months ago