ahans30 / Binoculars
[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text
☆213Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for Binoculars
- Improving Alignment and Robustness with Circuit Breakers☆154Updated last month
- Evaluating LLMs with fewer examples☆135Updated 7 months ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆259Updated last month
- ☆128Updated this week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆125Updated last month
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆179Updated 5 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆148Updated last month
- A toolkit for describing model features and intervening on those features to steer behavior.☆106Updated last week
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆74Updated this week
- Does Refusal Training in LLMs Generalize to the Past Tense? [NeurIPS 2024 Safe Generative AI Workshop (Oral)]☆57Updated last month
- Red-Teaming Language Models with DSPy☆142Updated 7 months ago
- ☆101Updated 3 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]☆220Updated 2 months ago
- A simple unified framework for evaluating LLMs☆145Updated last week
- code for training & evaluating Contextual Document Embedding models☆119Updated this week
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.☆99Updated this week
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated last year
- ☆150Updated 4 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆109Updated 3 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆158Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆237Updated 4 months ago
- ☆126Updated 7 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- Low-Rank adapter extraction for fine-tuned transformers model☆162Updated 6 months ago
- ☆107Updated this week
- ☆184Updated last month
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆236Updated last month
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆82Updated 6 months ago