☆32Oct 21, 2025Updated 5 months ago
Alternatives and similar repositories for selfplay-redteaming
Users that are interested in selfplay-redteaming are comparing it to the libraries listed below
Sorting:
- Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks☆42Feb 11, 2026Updated last month
- A library for soft differentiable relaxations of common JAX functions.☆44Updated this week
- Compositional Abstractions Tutorial☆13Nov 26, 2023Updated 2 years ago
- ☆12Apr 26, 2024Updated last year
- ☆15Aug 19, 2025Updated 7 months ago
- Solving the OpenAI Gym (MountainCarContinuous-v0) with DDPG☆21Jan 23, 2023Updated 3 years ago
- https://interactivetraining.ai/☆17Oct 2, 2025Updated 5 months ago
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆26Feb 5, 2026Updated last month
- Repository for the paper: "TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining" ACL Oral 2025☆22Mar 6, 2026Updated 2 weeks ago
- Solving some interesting problems using Python and C++☆14Aug 16, 2020Updated 5 years ago
- ☆17May 19, 2023Updated 2 years ago
- [NAACL 2025] Official Code Repository for the paper "Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval"☆20Jul 13, 2025Updated 8 months ago
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- Official Repository for Task-Circuit Quantization☆24Jun 1, 2025Updated 9 months ago
- The Harmonic Memory☆16Oct 18, 2023Updated 2 years ago
- Prompt + regex lab☆10Nov 22, 2023Updated 2 years ago
- Make open-weight LLM agents play the game "Among Us", and study how the models learn and express lying and deception in the game.☆28Dec 17, 2025Updated 3 months ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 3 years ago
- CR-LT KGQA Dataset Repository☆10Jun 1, 2025Updated 9 months ago
- ☆10May 27, 2024Updated last year
- ☆12May 27, 2022Updated 3 years ago
- HumanLM: Simulating Users with State Alignment Beats Response Imitation☆67Feb 27, 2026Updated 3 weeks ago
- On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory☆10Aug 19, 2021Updated 4 years ago
- Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing☆14Feb 18, 2021Updated 5 years ago
- Code of "Visualizing and Understanding Object Detecor"☆20Jun 24, 2021Updated 4 years ago
- [USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks☆20Sep 18, 2025Updated 6 months ago
- A PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling". This repository provides a flexible…☆89Dec 13, 2025Updated 3 months ago
- Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications☆13Jun 3, 2022Updated 3 years ago
- The official Python wrapper for the EBSCO Discovery Service API☆15Jul 26, 2024Updated last year
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆30Jun 30, 2025Updated 8 months ago
- Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking"☆18Mar 10, 2025Updated last year
- Codes for the paper "Optimizing Mode Connectivity via Neuron Alignment" from NeurIPS 2020.☆16Dec 10, 2020Updated 5 years ago
- カードゲームのプロキシ(コピーカード)を簡単に印刷するWebアプリ。☆16May 13, 2025Updated 10 months ago
- 99 problems, but a driver ain't one. (Push code, not buggies)☆26Oct 12, 2020Updated 5 years ago
- ☆11Feb 21, 2022Updated 4 years ago
- The Python programming language☆51Dec 19, 2025Updated 3 months ago
- Forced alignment for karaokes☆18Updated this week
- ☆17Sep 4, 2024Updated last year
- FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu …☆13Apr 25, 2024Updated last year