ApolloResearch / insider-tradingLinks

☆53

Alternatives and similar repositories for insider-trading

Users that are interested in insider-trading are comparing it to the libraries listed below

Sorting:

ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆90Updated last year
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
google-deepmind / dangerous-capability-evaluations
☆62Updated 2 months ago
safety-research / open-source-alignment-faking
Open Source Replication of Anthropic's Alignment Faking Paper
☆51Updated 7 months ago
METR / RE-Bench
☆119Updated last month
MinhxLe / subliminal-learning
☆104Updated 3 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 9 months ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
vinid / NegotiationArena
☆79Updated last year
microsoft / mechanistic-error-probe
A mechanistic approach for understanding and detecting factual errors of large language models.
☆48Updated last year
google-deepmind / mishax
☆143Updated 2 months ago
ExtensityAI / benchmark
Evaluation of neuro-symbolic engines
☆39Updated last year
KihoPark / LLM_Categorical_Hierarchical_Representations
☆111Updated 9 months ago
centerforaisafety / emergent-values
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆83Updated 9 months ago
allenai / hybrid-preferences
Learning to route instances for Human vs AI Feedback (ACL Main '25)
☆26Updated 4 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated last year
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
arcee-ai / DAM
☆55Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆66Updated 11 months ago
allenai / infinigram-api
☆87Updated this week
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆129Updated 9 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated last week
JacobPfau / fillerTokens
☆75Updated last year
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆97Updated 2 years ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆122Updated last year
haizelabs / bijection-learning
☆26Updated last year
microsoft / stop
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
☆48Updated last year
seacowx / OpenToM
The official repository of the OpenToM dataset
☆27Updated 9 months ago