TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research…
☆84Sep 1, 2025Updated 6 months ago
Alternatives and similar repositories for TaskTracker
Users that are interested in TaskTracker are comparing it to the libraries listed below
Sorting:
- Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge☆21Mar 1, 2026Updated 2 weeks ago
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature …☆11Aug 20, 2024Updated last year
- ☆18Mar 15, 2024Updated 2 years ago
- Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs☆23Sep 19, 2025Updated 6 months ago
- ☆29Aug 31, 2025Updated 6 months ago
- [ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms☆38Jun 4, 2025Updated 9 months ago
- Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses☆30Feb 11, 2025Updated last year
- Code to generate NeuralExecs (prompt injection for LLMs)☆27Oct 5, 2025Updated 5 months ago
- ☆23May 28, 2025Updated 9 months ago
- RAB: Provable Robustness Against Backdoor Attacks☆39Oct 3, 2023Updated 2 years ago
- ☆121Jul 2, 2024Updated last year
- [ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents☆32Jun 24, 2025Updated 8 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆488Mar 12, 2026Updated last week
- Official implementation of the WASP web agent security benchmark☆75Aug 12, 2025Updated 7 months ago
- ☆14Jun 6, 2023Updated 2 years ago
- [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…☆41Updated this week
- ☆32Sep 11, 2025Updated 6 months ago
- [Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.☆14Aug 27, 2023Updated 2 years ago
- Code for the paper Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers☆60Apr 29, 2022Updated 3 years ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 6 months ago
- CLI that queries multiple language models in parallel using prompts from a CSV file☆28Sep 24, 2025Updated 5 months ago
- Official Repository of the Entity-based Reinforcement Learning for Autonomous Cyber Defence paper.☆18Jan 24, 2025Updated last year
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆20Jan 27, 2024Updated 2 years ago
- ☆29Feb 27, 2025Updated last year
- ☆60Mar 9, 2023Updated 3 years ago
- This repository contains code and data of the paper **On the Limitations of Continual Learning for Malware Classification**, accepted to …☆19Dec 29, 2023Updated 2 years ago
- Erlang/OTP MTA (Mail Transfer Agent)☆29Jul 8, 2014Updated 11 years ago
- ☆27Sep 15, 2024Updated last year
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- ☆28Jan 17, 2024Updated 2 years ago
- ☆22Nov 19, 2021Updated 4 years ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆112Apr 15, 2024Updated last year
- [ACL 2025] The official implementation of the paper "PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free".☆63Dec 4, 2025Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆61Mar 11, 2025Updated last year
- Foolbox implementation for NeurIPS 2021 Paper: "Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints".☆24Mar 16, 2022Updated 4 years ago
- ☆10Mar 10, 2026Updated last week
- ☆17Nov 30, 2022Updated 3 years ago
- ☆76Mar 30, 2025Updated 11 months ago
- [EMNLP 2022] Distillation-Resistant Watermarking (DRW) for Model Protection in NLP☆13Aug 17, 2023Updated 2 years ago