alisawuffles / tokenizer-attackLinks
Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"
☆14Updated last month
Alternatives and similar repositories for tokenizer-attack
Users that are interested in tokenizer-attack are comparing it to the libraries listed below
Sorting:
- https://footprints.baulab.info☆17Updated 8 months ago
- In-context Example Selection with Influences☆15Updated 2 years ago
- Official Repository for Dataset Inference for LLMs☆34Updated 11 months ago
- ☆54Updated 2 years ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆35Updated 4 months ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆14Updated last month
- ☆44Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 5 months ago
- ☆19Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆98Updated 4 months ago
- ☆44Updated 4 months ago
- Pytorch Datasets for Easy-To-Hard☆27Updated 5 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆17Updated 10 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆94Updated last year
- ☆13Updated 2 years ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆37Updated 10 months ago
- ☆31Updated last year
- ☆35Updated 6 months ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆18Updated last year
- NeurIPS'24 - LLM Safety Landscape☆22Updated 4 months ago
- ☆23Updated 10 months ago
- ☆10Updated last year
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆23Updated 10 months ago
- ☆44Updated 3 months ago
- Applies ROME and MEMIT on Mamba-S4 models☆14Updated last year
- ☆35Updated 2 years ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆47Updated 8 months ago
- ☆26Updated last year
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month