TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research…
☆85Sep 1, 2025Updated 6 months ago
Alternatives and similar repositories for TaskTracker
Users that are interested in TaskTracker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature …☆11Aug 20, 2024Updated last year
- Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs☆23Sep 19, 2025Updated 6 months ago
- Code for paper "Poisoned classifiers are not only backdoored, they are fundamentally broken"☆26Jan 7, 2022Updated 4 years ago
- ☆29Aug 31, 2025Updated 7 months ago
- General research for Dreadnode☆26Jun 17, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code for the paper "Firewalls to Secure Dynamic LLM Agentic Networks"☆30Jun 6, 2025Updated 9 months ago
- ☆21Mar 18, 2025Updated last year
- ☆24May 28, 2025Updated 10 months ago
- ☆128Jul 2, 2024Updated last year
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆499Mar 12, 2026Updated 2 weeks ago
- ☆14Jun 6, 2023Updated 2 years ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆112Sep 27, 2024Updated last year
- ☆36May 21, 2025Updated 10 months ago
- ☆32Sep 11, 2025Updated 6 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for the paper Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers☆60Apr 29, 2022Updated 3 years ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆40Sep 17, 2025Updated 6 months ago
- Official Repository of the Entity-based Reinforcement Learning for Autonomous Cyber Defence paper.☆18Jan 24, 2025Updated last year
- ☆15Jun 7, 2024Updated last year
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆20Jan 27, 2024Updated 2 years ago
- Universal Robustness Evaluation Toolkit (for Evasion)☆32Sep 17, 2025Updated 6 months ago
- ☆59Mar 9, 2023Updated 3 years ago
- Example agents for the Dreadnode platform☆25Dec 19, 2025Updated 3 months ago
- This repository contains code and data of the paper **On the Limitations of Continual Learning for Malware Classification**, accepted to …☆19Dec 29, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents (TDSC 2024)☆19Mar 29, 2024Updated 2 years ago
- Erlang/OTP MTA (Mail Transfer Agent)☆29Jul 8, 2014Updated 11 years ago
- ☆27Sep 15, 2024Updated last year
- Practical Jupyter notebooks from Andrew Ng and Giskard team's "Red Teaming LLM Applications" course on DeepLearning.AI.☆23Apr 8, 2024Updated last year
- ☆21Nov 19, 2021Updated 4 years ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆115Apr 15, 2024Updated last year
- Notebooks and other stuff from my analysis of scientific literature I've found interesting☆18Jul 3, 2017Updated 8 years ago
- [ACL 2025] The official implementation of the paper "PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free".☆63Dec 4, 2025Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆61Mar 11, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Foolbox implementation for NeurIPS 2021 Paper: "Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints".☆24Mar 16, 2022Updated 4 years ago
- ☆16Nov 30, 2022Updated 3 years ago
- ☆75Mar 30, 2025Updated last year
- Exploiting Jackson deserialization vulnerability with 3 gadgets☆10May 3, 2021Updated 4 years ago
- [EMNLP 2022] Distillation-Resistant Watermarking (DRW) for Model Protection in NLP☆13Aug 17, 2023Updated 2 years ago
- Data Poisoning in Deep Learning: A Survey☆22Jan 18, 2026Updated 2 months ago
- TDSC 2022 | An explainable GNN-based Android malware detection system in paper "MsDroid: Identifying Malicious Snippets for Android Malwa…☆65Feb 21, 2024Updated 2 years ago