carlini / chess-llm
Play chess against large language models.
☆38Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for chess-llm
- ☆101Updated 3 months ago
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆26Updated last week
- Measuring the situational awareness of language models☆33Updated 9 months ago
- ☆90Updated 4 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆34Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆61Updated last week
- ☆24Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆179Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆107Updated 5 months ago
- ☆28Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆39Updated 10 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆62Updated 5 months ago
- ☆18Updated last month
- Improving Alignment and Robustness with Circuit Breakers☆154Updated last month
- ☆44Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆34Updated last month
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆51Updated 5 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆37Updated 5 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆42Updated 6 months ago
- Text-based game of lies and deceit, made for language models.☆29Updated last year
- ☆80Updated last month
- Does Refusal Training in LLMs Generalize to the Past Tense? [NeurIPS 2024 Safe Generative AI Workshop (Oral)]☆57Updated last month
- ☆62Updated 3 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- Replicating O1 inference-time scaling laws☆49Updated last month
- 🧠 Starter templates for doing interpretability research☆63Updated last year