ArGintum / GPTID
Official code repository for article Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts
☆26Updated last year
Alternatives and similar repositories for GPTID:
Users that are interested in GPTID are comparing it to the libraries listed below
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆80Updated 2 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- ☆211Updated this week
- ☆109Updated 5 months ago
- Framework for probing tasks☆25Updated 10 months ago
- ☆75Updated 5 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆65Updated 10 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆84Updated last year
- ☆43Updated 5 months ago
- ☆140Updated this week
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆88Updated this week
- ☆86Updated last year
- Aioli: A unified optimization framework for language model data mixing☆19Updated 2 weeks ago
- ☆94Updated 7 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆66Updated 2 months ago
- A library for efficient patching and automatic circuit discovery.☆48Updated 2 months ago
- ☆59Updated 9 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆51Updated 10 months ago
- Sparse probing paper full code.☆54Updated last year
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆79Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆135Updated 3 months ago
- ☆17Updated last month
- Replicating O1 inference-time scaling laws☆73Updated 2 months ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆34Updated this week
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆69Updated last year
- Data and code for the Corr2Cause paper (ICLR 2024)☆92Updated 9 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆32Updated 2 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆156Updated 3 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆83Updated 2 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year