Algorithmic-Alignment-Lab/CommonClaim

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Algorithmic-Alignment-Lab/CommonClaim)

Algorithmic-Alignment-Lab / CommonClaim

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

☆15

Alternatives and similar repositories for CommonClaim

Users that are interested in CommonClaim are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
ZhaolinGao / REFUEL
View on GitHub
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆25Oct 8, 2024Updated last year
flageval-baai / HalluDial
View on GitHub
☆21Aug 19, 2024Updated last year
claws-lab / casper
View on GitHub
Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"
☆10Aug 16, 2022Updated 3 years ago
sejoonoh / ATR
View on GitHub
Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"
☆12Aug 1, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenLMLab / Sniffer
View on GitHub
☆27Jun 5, 2023Updated 3 years ago
mattynaz / latex-notes
View on GitHub
A LaTeX document class for notes 📝 and textbooks 📚
☆14Jul 14, 2021Updated 5 years ago
liuzrcc / AIP
View on GitHub
Adversarial Item Promotion in visually-aware recommenders
☆17Sep 3, 2021Updated 4 years ago
llm-platform-security / chatgpt-plugin-eval
View on GitHub
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
☆29Jul 29, 2024Updated last year
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
stefanholek / conditional
View on GitHub
Conditionally enter a context manager
☆10Jul 4, 2026Updated 2 weeks ago
ICTMCG / FakingRecipe
View on GitHub
Official Repository for "FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process", ACM MM 202…
☆65Oct 5, 2025Updated 9 months ago
shreyansh26 / Red-Teaming-Language-Models-with-Language-Models
View on GitHub
A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022
☆35Oct 9, 2023Updated 2 years ago
sichunluo / RecRanker
View on GitHub
[TOIS'24] "RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation"
☆16Dec 1, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ArnaudFickinger / adversarial-surprise
View on GitHub
Explore and Control with Adversarial Surprise
☆10Jul 20, 2021Updated 5 years ago
facebookresearch / rlfh-gen-div
View on GitHub
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆50Jan 19, 2024Updated 2 years ago
graphml-lab-pwr / lapeigvals
View on GitHub
Implementation of the paper "Hallucination Detection in LLMs Using Spectral Features of Attention Maps"
☆16Oct 18, 2025Updated 9 months ago
alevine0 / randomizedAblation
View on GitHub
Code for the paper "Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation" by Alexander Levine and Soheil Feizi.
☆10Aug 22, 2022Updated 3 years ago
wagner-group / prompt-injection-defense
View on GitHub
Fine-tuning base models to build robust task-specific models
☆36Apr 11, 2024Updated 2 years ago
llylly / Robustra
View on GitHub
A method for training neural networks that are provably robust to adversarial attacks. [IJCAI 2019]
☆10Sep 3, 2019Updated 6 years ago
HumanCompatibleAI / overcooked-hAI-exp
View on GitHub
Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
☆13May 10, 2021Updated 5 years ago
ChangxinTian / RAPU
View on GitHub
[KDD'21] Official PyTorch implementation for "Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data".
☆13Sep 19, 2021Updated 4 years ago
Improbable-AI / curiosity_redteam
View on GitHub
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆90Mar 15, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
YichenZW / awesome-llm-diversity
View on GitHub
A curated collection of research papers exploring diversity in Large Language Model text generation. This repository tracks cutting-edge …
☆15Jun 19, 2026Updated last month
RU-System-Software-and-Security / NONE
View on GitHub
☆10Oct 31, 2022Updated 3 years ago
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
FuxiaoLiu / Twitter-Video-dataset
View on GitHub
[EACL'23] COVID-VTS: Fact Extraction and Verification on Short Video Platforms
☆12Sep 26, 2023Updated 2 years ago
MiracleHH / RecommPoison
View on GitHub
This is the code implementation for the paper "Data Poisoning Attacks to Deep Learning Based Recommender Systems"
☆17Sep 8, 2022Updated 3 years ago
sccn / ICLabel-Dataset
View on GitHub
Dataset for training EEG IC classifiers.
☆14Aug 29, 2021Updated 4 years ago
NaNoGenMo / 2021
View on GitHub
National Novel Generation Month, 2021 edition.
☆44Sep 30, 2023Updated 2 years ago
peterchenyipu / dp_attacker
View on GitHub
☆17Sep 25, 2024Updated last year
cgraber / NLStruct
View on GitHub
Code used to produce experimental results for the paper "Deep Structured Prediction with Nonlinear Output Activations"
☆11May 6, 2019Updated 7 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
minerllabs / basalt_competition_baseline_submissions
View on GitHub
☆11Sep 29, 2021Updated 4 years ago
AI-secure / Knowledge-Enhanced-Machine-Learning-Pipeline
View on GitHub
Repository for Knowledge Enhanced Machine Learning Pipeline (KEMLP)
☆10Jun 5, 2021Updated 5 years ago
launchnlp / LitCab
View on GitHub
☆25Jun 10, 2025Updated last year
Kaffaljidhmah2 / SpecDec_pp
View on GitHub
Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
☆19Jul 10, 2025Updated last year
multimeric / koa-pg-session
View on GitHub
A model implementation of sessions for koa using postgres as the backend
☆10Oct 16, 2017Updated 8 years ago
thu-coai / CDConv
View on GitHub
Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"
☆13May 8, 2023Updated 3 years ago