ai-safety-graph / AISafetyGraphView external linksLinks
AI Safety Graph
☆18Mar 20, 2025Updated 10 months ago
Alternatives and similar repositories for AISafetyGraph
Users that are interested in AISafetyGraph are comparing it to the libraries listed below
Sorting:
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- ☆12Jul 12, 2024Updated last year
- Benchmark to estimate model sycophancy☆21Nov 30, 2025Updated 2 months ago
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- data and tools related to corona virus research☆11Feb 24, 2025Updated 11 months ago
- The Happy Faces Benchmark☆15Jul 20, 2023Updated 2 years ago
- ☆34Feb 20, 2025Updated 11 months ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- 👜 Easily pick a place to store data for your Python code.☆42Updated this week
- ☆50Aug 3, 2024Updated last year
- ☆48May 9, 2024Updated last year
- ☆47May 21, 2024Updated last year
- ☆58Feb 12, 2024Updated 2 years ago
- ☆65Updated this week
- ARCHIVED. Please use https://docs.adapterhub.ml/huggingface_hub.html || 🔌 A central repository collecting pre-trained adapter modules☆69May 26, 2024Updated last year
- REALSumm: Re-evaluating Evaluation in Text Summarization☆73Sep 22, 2025Updated 4 months ago
- Following emerging Large Language Model Operations (LLM Ops) best practices in the industry, you’ll learn all about the key technologies …☆291Apr 11, 2024Updated last year
- Everything about LLMs in production.☆78Jun 29, 2024Updated last year
- Utilities for the HuggingFace transformers library☆74Jan 21, 2023Updated 3 years ago
- spk aka spritzgebaeck: A small OSINT/Recon tool to find CIDRs that belong to a specific organization.☆84Jan 12, 2026Updated last month
- An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!☆99Updated this week
- A curated list of algorithms and papers for auditing black-box algorithms.☆112Oct 24, 2025Updated 3 months ago
- ☆118Jan 19, 2026Updated 3 weeks ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆128Mar 1, 2024Updated last year
- ☆146Jul 23, 2025Updated 6 months ago
- Papers and resources related to the security and privacy of LLMs 🤖☆561Jun 8, 2025Updated 8 months ago
- A library for finding knowledge neurons in pretrained transformer models.☆159Feb 13, 2022Updated 4 years ago
- METR Task Standard☆173Feb 3, 2025Updated last year
- ☆929Feb 4, 2026Updated last week
- Make URL path combinations using a wordlist☆169Sep 25, 2023Updated 2 years ago
- A suite of test scenarios for multi-agent reinforcement learning.☆784Feb 1, 2026Updated last week
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Jan 26, 2026Updated 2 weeks ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆240Dec 16, 2024Updated last year
- A numeric optimization package for Torch.☆196Nov 27, 2017Updated 8 years ago
- ☆228Feb 23, 2021Updated 4 years ago
- Collection of evals for Inspect AI☆361Updated this week
- Erasing concepts from neural representations with provable guarantees☆243Jan 27, 2025Updated last year
- A library for generative social simulation☆1,178Feb 6, 2026Updated last week