TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research…
☆80Sep 1, 2025Updated 5 months ago
Alternatives and similar repositories for TaskTracker
Users that are interested in TaskTracker are comparing it to the libraries listed below
Sorting:
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature …☆11Aug 20, 2024Updated last year
- Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge☆19Feb 13, 2026Updated 2 weeks ago
- General research for Dreadnode☆27Jun 17, 2024Updated last year
- ☆29Aug 31, 2025Updated 6 months ago
- Code for paper "Poisoned classifiers are not only backdoored, they are fundamentally broken"☆26Jan 7, 2022Updated 4 years ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆86Jul 24, 2025Updated 7 months ago
- CLI that queries multiple language models in parallel using prompts from a CSV file☆28Sep 24, 2025Updated 5 months ago
- Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs☆23Sep 19, 2025Updated 5 months ago
- ☆10Sep 24, 2021Updated 4 years ago
- Official implementation of the WASP web agent security benchmark☆70Aug 12, 2025Updated 6 months ago
- Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses☆29Feb 11, 2025Updated last year
- Code to generate NeuralExecs (prompt injection for LLMs)☆27Oct 5, 2025Updated 4 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆443Feb 3, 2026Updated 3 weeks ago
- PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents (TDSC 2024)☆18Mar 29, 2024Updated last year
- ☆35May 21, 2025Updated 9 months ago
- ☆27Sep 11, 2025Updated 5 months ago
- ☆14Jun 6, 2023Updated 2 years ago
- Set-of-Mark Prompting for LMMs☆13Jun 5, 2024Updated last year
- ☆14Jan 7, 2024Updated 2 years ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆109Sep 27, 2024Updated last year
- ☆117Jul 2, 2024Updated last year
- ☆60Mar 9, 2023Updated 2 years ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- Example agents for the Dreadnode platform☆22Dec 19, 2025Updated 2 months ago
- ☆17Nov 30, 2022Updated 3 years ago
- Official Repository of the Entity-based Reinforcement Learning for Autonomous Cyber Defence paper.☆18Jan 24, 2025Updated last year
- [Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.☆14Aug 27, 2023Updated 2 years ago
- ☆16May 26, 2021Updated 4 years ago
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)☆34Jun 29, 2025Updated 7 months ago
- Lightweight agentic coding environment☆12Jan 17, 2026Updated last month
- [IEEE S&P'24] ODSCAN: Backdoor Scanning for Object Detection Models☆20Oct 5, 2025Updated 4 months ago
- [ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms☆35Jun 4, 2025Updated 8 months ago
- 📚 Build knowledge bases for RAG☆31Jul 3, 2025Updated 7 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆104Apr 15, 2024Updated last year
- This repository is the official implementation of the paper "ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning…☆19Jun 7, 2023Updated 2 years ago
- [S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models☆19Feb 18, 2025Updated last year
- ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.☆37Jul 20, 2025Updated 7 months ago
- Concealed Data Poisoning Attacks on NLP Models☆21Sep 4, 2023Updated 2 years ago
- FLTracer: Accurate Poisoning Attack Provenance in Federated Learning☆24Jun 14, 2024Updated last year