ahans30 / BinocularsLinks

[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text

☆320

Alternatives and similar repositories for Binoculars

Users that are interested in Binoculars are comparing it to the libraries listed below

Sorting:

vivek3141 / ghostbuster
Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)
☆166Updated last year
Data-Provenance-Initiative / Data-Provenance-Collection
☆256Updated 7 months ago
msclar / formatspread
Code accompanying "How I learned to start worrying about prompt formatting".
☆110Updated 5 months ago
Mihaiii / llm_steer
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆248Updated 9 months ago
GraySwanAI / circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
☆242Updated last year
vec2text / vec2text
utilities for decoding deep representations (like sentence embeddings) back to text
☆990Updated 3 months ago
vinusankars / Reliability-of-AI-text-detectors
Can AI-Generated Text be Reliably Detected?
☆86Updated 2 years ago
martiansideofthemoon / ai-detection-paraphrases
Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…
☆179Updated 2 years ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
CHATS-lab / persuasive_jailbreaker
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆334Updated last month
liamdugan / raid
RAID is the largest and most challenging benchmark for AI-generated text detection. (ACL 2024)
☆96Updated last month
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆300Updated last year
jwkirchenbauer / lm-watermarking
☆643Updated 2 months ago
tml-epfl / llm-past-tense
Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]
☆77Updated 10 months ago
lukasberglund / reversal_curse
☆297Updated 2 years ago
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆361Updated 10 months ago
andyrdt / refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆299Updated 5 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆129Updated 9 months ago
cohere-ai / magikarp
Code for the paper "Fishing for Magikarp"
☆174Updated 6 months ago
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆94Updated 11 months ago
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆168Updated last year
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆194Updated last year
Libr-AI / do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆295Updated last year
samrawal / llama2_chat_templater
Wrapper to easily generate the chat template for Llama2
☆65Updated last year
eric-mitchell / detect-gpt
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
☆440Updated 2 years ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆92Updated last year
junchaoIU / LLM-generated-Text-Detection
A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current…
☆79Updated last year
shengliu66 / ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆192Updated 9 months ago
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆173Updated 4 months ago
microsoft / TOXIGEN
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
☆339Updated last year