microsoft / TaskTracker

TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a simple linear probe-based method and a more sophisticated metric learning method to achieve this. The project also releases the computationally expensive activation data to stimulate further AI safety research…
43Updated last month

Alternatives and similar repositories for TaskTracker:

Users that are interested in TaskTracker are comparing it to the libraries listed below