Explore, Establish, Exploit: Red Teaming Language Models from Scratch
☆13Jun 21, 2023Updated 2 years ago
Alternatives and similar repositories for CommonClaim
Users that are interested in CommonClaim are comparing it to the libraries listed below
Sorting:
- ☆31Jul 14, 2023Updated 2 years ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Oct 8, 2024Updated last year
- ☆21Aug 19, 2024Updated last year
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆33Nov 8, 2024Updated last year
- A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022☆35Oct 9, 2023Updated 2 years ago
- ☆10Feb 2, 2026Updated 3 weeks ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆16Nov 19, 2025Updated 3 months ago
- source files for GloBI website☆10Feb 17, 2026Updated last week
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals☆12May 24, 2024Updated last year
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- ☆12Oct 1, 2025Updated 4 months ago
- ☆12Mar 5, 2025Updated 11 months ago
- a Video Quality Analysis Toolkit☆13May 16, 2025Updated 9 months ago
- Scripts for KGIRNet model for ESWC☆10Jul 6, 2023Updated 2 years ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago
- ☆12Jan 4, 2024Updated 2 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- Applied Data Science training course (for updates and resources, read the ReadMe file below)☆15Sep 9, 2023Updated 2 years ago
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- Information Extraction related tools and models☆10Mar 16, 2023Updated 2 years ago
- This repository contains the code for all figures in the paper "General Pitfalls of Model-agnostic Interpretation Methods for Machine Lea…☆15Aug 17, 2021Updated 4 years ago
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- ☆16Mar 22, 2025Updated 11 months ago
- Code for the paper "Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation" by Alexander Levine and Soheil Feizi.☆10Aug 22, 2022Updated 3 years ago
- Code necessary to reproduce experiments in "FloraBERT: cross-species transfer learning with attention-based neural networks for gene expr…☆13Jul 6, 2022Updated 3 years ago
- Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation☆28Dec 10, 2025Updated 2 months ago
- This is the implementation of paper "Learning to Ask Conversational Questions by Optimizing Levenshtein Distance".☆10Jul 5, 2021Updated 4 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- ☆12Jan 25, 2024Updated 2 years ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆47Jan 19, 2024Updated 2 years ago
- The official Genbench Collaborative Benchmarking Task repository 2023 (Archived)☆14Jul 23, 2024Updated last year
- Official codebase for the NeurIPS 2023 paper: Towards Last-layer Retraining for Group Robustness with Fewer Annotations. https://arxiv.or…☆11May 15, 2024Updated last year
- ☆12Jan 2, 2024Updated 2 years ago
- A model implementation of sessions for koa using postgres as the backend☆10Oct 16, 2017Updated 8 years ago
- Word2vec Model Reader for Node.js Client☆13May 8, 2019Updated 6 years ago
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year