Code for our paper "Localizing Lying in Llama"
☆13Apr 24, 2025Updated 10 months ago
Alternatives and similar repositories for llama-lying
Users that are interested in llama-lying are comparing it to the libraries listed below
Sorting:
- ☆10Oct 31, 2022Updated 3 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- A python sdk for LLM finetuning and inference on runpod infrastructure☆20Feb 16, 2026Updated 2 weeks ago
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 2 years ago
- ☆12Jul 11, 2021Updated 4 years ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- The official code for ICML 2024 "FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Recons…☆29Jun 6, 2024Updated last year
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- Decoder only transformer, built from scratch with PyTorch☆33Oct 22, 2023Updated 2 years ago
- Code to break Llama Guard☆32Dec 7, 2023Updated 2 years ago
- The public web API of the National Museum of Australia☆11Sep 12, 2023Updated 2 years ago
- A machine learning model to recommend movies & tv series☆11Oct 26, 2020Updated 5 years ago
- AlgZoo: uninterpreted models with fewer than 1,500 parameters☆43Jan 19, 2026Updated last month
- ☆12Jul 8, 2024Updated last year
- Library on Arduino to add over the air (OTA) Update Capabilities to bw16/rtl8720DN☆11Aug 6, 2024Updated last year
- Run a raffle among the 🌟 stargazers 🌟 of a Github project!☆11Mar 23, 2023Updated 2 years ago
- ☆13Dec 16, 2024Updated last year
- Interactive and dynamic painting simulation in WebGL☆13May 2, 2024Updated last year
- Simple parsing of CSV into case classes in Scala☆11Feb 12, 2025Updated last year
- ☆13Jun 4, 2025Updated 9 months ago
- Provide a simple, script-friendly interface to posting stuff on matrix channels☆10Mar 9, 2018Updated 7 years ago
- A guide for those who are stuck not being able to migrate their apps away from JS/TypeScript/Flow. See index.tsx☆10Nov 15, 2017Updated 8 years ago
- ☆11Jan 4, 2023Updated 3 years ago
- Applying backdoor attacks to BadNet on MNIST and ResNet on CIFAR10.☆13Aug 25, 2021Updated 4 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆21Feb 19, 2026Updated 2 weeks ago
- Our work on Reinforcement learning that we share with the rest of the world☆13Jan 7, 2019Updated 7 years ago
- Code used to produce experimental results for the paper "Deep Structured Prediction with Nonlinear Output Activations"☆11May 6, 2019Updated 6 years ago
- Kaggle Competition : IEEE-CIS-Fraud-Detection☆10Jan 18, 2020Updated 6 years ago
- A method for training neural networks that are provably robust to adversarial attacks. [IJCAI 2019]☆10Sep 3, 2019Updated 6 years ago
- Advent Of Code solutions in Haskell☆11Dec 8, 2019Updated 6 years ago
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- ☆13May 7, 2023Updated 2 years ago
- Raw data from the collections database in json and csv format☆12Jul 26, 2022Updated 3 years ago
- a Hadoop Map Reduce application that retrieves data/articles related to sports from sources like NY Times, Commoncrawl, and Twitter and c…☆13Oct 3, 2019Updated 6 years ago
- Streaming effects for PureScript☆16Nov 8, 2021Updated 4 years ago
- Repository for my dotfiles☆12Dec 3, 2025Updated 3 months ago
- Code Repository for the Paper ---Revisiting the Assumption of Latent Separability for Backdoor Defenses (ICLR 2023)☆47Feb 28, 2023Updated 3 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Feb 23, 2026Updated last week