EntropyLabsAI / sentinel

A control plane to oversee agents operating in the wild

☆22

Related projects ⓘ

Alternatives and complementary repositories for sentinel

lingo-iitgn / ACM-SS-2024-GenAI
Repository for ACM India Summer School on Generative AI for Text
☆11Updated 4 months ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆200Updated 9 months ago
danielmamay / mlab
Machine Learning for Alignment Bootcamp (MLAB).
☆22Updated 2 years ago
jjallaire / inspect-llm-workshop
☆47Updated 5 months ago
METR / public-tasks
☆67Updated 2 weeks ago
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆406Updated this week
EleutherAI / sae
Sparse autoencoders
☆344Updated last week
AgentTorch / AgentTorch
large population models
☆214Updated 3 weeks ago
allenai / fm-cheatsheet
Website for hosting the Open Foundation Models Cheat Sheet.
☆257Updated 4 months ago
callummcdougall / ARENA_3.0
☆351Updated this week
normal-computing / posteriors
Uncertainty quantification with PyTorch
☆329Updated 2 weeks ago
gordicaleksa / serbian-llm-eval
Serbian LLM Eval.
☆88Updated 8 months ago
Lightning-AI / dl-fundamentals
Deep Learning Fundamentals -- Code material and exercises
☆349Updated 8 months ago
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆325Updated last year
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆119Updated 5 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆200Updated 4 months ago
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆717Updated last month
HMUNACHI / nanodl
A Jax-based library for designing and training transformer models from scratch.
☆276Updated 2 months ago
srush / Autodiff-Puzzles
☆391Updated last month
METR / task-standard
METR Task Standard
☆124Updated 3 weeks ago
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆64Updated 2 weeks ago
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆179Updated 4 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆186Updated this week
isamu-isozaki / huggingface-reading-group
This repository's goal is to precompile all past presentations of the Huggingface reading group
☆46Updated 2 months ago
warner-benjamin / commented-transformers
Highly commented implementations of Transformers in PyTorch
☆128Updated last year
muellerzr / minimal-trainer-zoo
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
☆195Updated 6 months ago
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆63Updated last year
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆289Updated 3 weeks ago
google-deepmind / nanodo
☆197Updated 4 months ago
jrzmnt / rl-vs-llm-chess
☆21Updated last month