⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
☆118Oct 27, 2025Updated 5 months ago
Alternatives and similar repositories for thought-anchors
Users that are interested in thought-anchors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Jul 9, 2025Updated 9 months ago
- Repository for the "Chain-of-Thought Reasoning In The Wild Is Not Always Faithful" paper☆32Mar 31, 2026Updated last week
- ☆22Feb 13, 2026Updated 2 months ago
- RAG based chatbot for Global AI Hub☆26Oct 4, 2025Updated 6 months ago
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆31Oct 2, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Restore safety in fine-tuned language models through task arithmetic☆32Mar 28, 2024Updated 2 years ago
- Exemplary, annotated machine learning pipeline for any tabular data problem.☆27Aug 30, 2019Updated 6 years ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆65Oct 24, 2025Updated 5 months ago
- A library for training crosscoders☆16May 28, 2025Updated 10 months ago
- ☆279Oct 1, 2024Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆59Sep 22, 2025Updated 6 months ago
- Code and materials for "Weird Generalization and Inductive Backdoors"☆37Jan 11, 2026Updated 3 months ago
- [CVPR 2026 Main] MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation☆21Mar 26, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A tiny easily hackable implementation of a feature dashboard.☆16Oct 21, 2025Updated 5 months ago
- ☆54Feb 19, 2025Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆307Feb 8, 2026Updated 2 months ago
- [ICML 2025] Unlearning in Diffusion Models using Sparse Autoencoders☆55Oct 16, 2025Updated 5 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆28Aug 9, 2025Updated 8 months ago
- ☆66Jul 14, 2025Updated 8 months ago
- James' cookbook of evaluations and finetuning experiments☆26Feb 19, 2026Updated last month
- Code for steering and monitoring with concepts vectors in LLMs. https://arxiv.org/abs/2502.03708☆28Aug 10, 2025Updated 8 months ago
- Open source interpretability artefacts for R1.☆172Apr 21, 2025Updated 11 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆144Sep 14, 2022Updated 3 years ago
- ☆17Feb 14, 2024Updated 2 years ago
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html☆96Mar 27, 2025Updated last year
- This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …☆137Feb 8, 2026Updated 2 months ago
- ☆58Nov 19, 2024Updated last year
- ☆39Jun 14, 2025Updated 9 months ago
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆28May 14, 2025Updated 10 months ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated 2 months ago
- Codebase for information theoretic shapley values to explain predictive uncertainty.This repo contains the code related to the paperWatso…☆22Jul 4, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- ☆21Apr 15, 2025Updated 11 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 6 months ago
- Stochastic Parameter Decomposition☆70Updated this week
- ☆25Dec 20, 2023Updated 2 years ago
- Sparsify transformers with SAEs and transcoders☆704Updated this week
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆35Oct 28, 2025Updated 5 months ago