AI Safety Graph
☆18Mar 20, 2025Updated 11 months ago
Alternatives and similar repositories for AISafetyGraph
Users that are interested in AISafetyGraph are comparing it to the libraries listed below
Sorting:
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- ☆12Jul 12, 2024Updated last year
- We live in a colorful world, but how much do you really know about color? You eyes may deceive you, while the sensors don’t lie. This AS7…☆12Jan 20, 2022Updated 4 years ago
- Important ideas☆18Oct 13, 2025Updated 4 months ago
- Benchmark to estimate model sycophancy☆22Nov 30, 2025Updated 3 months ago
- 📚📚📚📚📚📚📚📚📚 Reading everything☆15Sep 12, 2025Updated 5 months ago
- data and tools related to corona virus research☆11Feb 24, 2025Updated last year
- Automatically turn your handwritten journal entries into a website using GPT3 OCR python and html☆13Dec 15, 2021Updated 4 years ago
- ☆20May 25, 2024Updated last year
- The Happy Faces Benchmark☆15Jul 20, 2023Updated 2 years ago
- A new algorithm that formulates jailbreaking as a reasoning problem.☆26Jul 2, 2025Updated 8 months ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- LessWrong Ebook Library☆53Feb 13, 2023Updated 3 years ago
- An attribution library for LLMs☆46Sep 17, 2024Updated last year
- ☆50Aug 3, 2024Updated last year
- ☆48May 9, 2024Updated last year
- ☆47May 21, 2024Updated last year
- ☆58Feb 12, 2024Updated 2 years ago
- REALSumm: Re-evaluating Evaluation in Text Summarization☆73Sep 22, 2025Updated 5 months ago
- A tool to find redirection chains in multiple URLs☆78Jan 1, 2025Updated last year
- Utilities for the HuggingFace transformers library☆75Jan 21, 2023Updated 3 years ago
- Low latency Limit Order Book and Matching Engine created in C++, able to handle over 1.4 million transactions per second.☆139Jun 12, 2024Updated last year
- Machine Learning for Alignment Bootcamp☆82Apr 27, 2022Updated 3 years ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆158Feb 27, 2026Updated last week
- ☆120Jan 19, 2026Updated last month
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆129Mar 1, 2024Updated 2 years ago
- ☆147Jul 23, 2025Updated 7 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆140Feb 21, 2025Updated last year
- ☆134Oct 28, 2023Updated 2 years ago
- This repo is part of a tutorial about writing microservices using Spring Boot☆140Sep 1, 2018Updated 7 years ago
- A high-performance, thread-safe limit order book implementation written in Rust. This project provides a comprehensive order matching eng…☆300Updated this week
- Optimal control of risk aversion in Avellaneda Stoikov high frequency market making model with Soft Actor Critic reinforcement learning☆148Dec 28, 2019Updated 6 years ago
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆176Oct 27, 2023Updated 2 years ago
- Language model alignment-focused deep learning curriculum☆1,537Aug 19, 2024Updated last year
- spring-security-tutorial☆172Apr 20, 2022Updated 3 years ago
- ☆960Updated this week