safety-research / finetuning-auditorView external linksLinks
Auditing agents for fine-tuning safety
☆18Oct 21, 2025Updated 3 months ago
Alternatives and similar repositories for finetuning-auditor
Users that are interested in finetuning-auditor are comparing it to the libraries listed below
Sorting:
- ☆27Oct 6, 2024Updated last year
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆57Updated this week
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 10 months ago
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆11Oct 23, 2023Updated 2 years ago
- ☆71Updated this week
- A library for training crosscoders☆15May 28, 2025Updated 8 months ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆60Jul 24, 2025Updated 6 months ago
- [ICML 2025] Repository for M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture☆17Nov 4, 2025Updated 3 months ago
- An analog touch screen joystick that pretends to be a bevy gamepad☆13Jul 13, 2024Updated last year
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- Proof of concept of view_maybe☆12Dec 9, 2024Updated last year
- see github.com/understanding-search/maze-transformer☆10Dec 8, 2023Updated 2 years ago
- Service offerings expressed with Orchestra☆12Jan 5, 2026Updated last month
- Fast wavelet transforms on the sphere☆13Dec 20, 2016Updated 9 years ago
- ☆34Updated this week
- ☆12Updated this week
- ✒️ A gallery of experiments with Scalable Vector Graphics (SVG) and interactive visualizations.☆13Jan 6, 2023Updated 3 years ago
- Customizable charts made with TikZ and LaTeX3☆14Feb 11, 2023Updated 3 years ago
- strongDM SDK for the Python programming language.☆15Updated this week
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 7 months ago
- 🧠 Inspecting complexity and goal-directedness of imagination in an fNIRS BCI system.☆11Aug 26, 2023Updated 2 years ago
- Flight Recorder allows to record client program execution and examine it later☆11Sep 18, 2020Updated 5 years ago
- A repository which contain Leetcode, hackerrank and codesignals problems☆10Mar 13, 2023Updated 2 years ago
- Minimal coding, computer-use and deep research agents using the OpenAI Agents SDK☆27Feb 5, 2026Updated last week
- Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge☆19Oct 21, 2025Updated 3 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago
- ☆10Nov 1, 2024Updated last year
- First neural GPT aligned with text and speech. Welcome to join us to make better foundation model in neural modality.☆14Oct 30, 2024Updated last year
- Flexible memory allocation tool for multi-tiered memory systems☆13Jan 7, 2026Updated last month
- ☆13Feb 9, 2019Updated 7 years ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Ansible Playbook to deploy and configure services on remote or local host.☆11Feb 6, 2026Updated last week
- Automated terminal emulator benchmarks☆22Jan 14, 2026Updated last month
- An implementation of base85 encoding, which is more space-efficient than base64☆14Jun 25, 2023Updated 2 years ago
- Versions of gcc running on ubuntu☆15Aug 9, 2025Updated 6 months ago
- ☆16Nov 18, 2024Updated last year
- ☆11Jun 8, 2023Updated 2 years ago
- ☆16Sep 25, 2025Updated 4 months ago
- ☆12Oct 23, 2022Updated 3 years ago