haizelabs / bijection-learningLinks

☆26

Alternatives and similar repositories for bijection-learning

Users that are interested in bijection-learning are comparing it to the libraries listed below

Sorting:

haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆235Updated 9 months ago
haizelabs / sphynx
Sphynx Hallucination Induction
☆53Updated 9 months ago
haizelabs / get-haized
A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.
☆98Updated 7 months ago
redteaming-arena / redteam-arena
☆34Updated 5 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆310Updated 2 weeks ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
safety-research / open-source-alignment-faking
Open Source Replication of Anthropic's Alignment Faking Paper
☆51Updated 7 months ago
google-deepmind / mishax
☆143Updated 2 months ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆133Updated 6 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆122Updated last year
egozverev / Should-It-Be-Executed-Or-Processed
Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.
☆56Updated 8 months ago
METR / RE-Bench
☆117Updated last month
centerforaisafety / emergent-values
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆83Updated 8 months ago
google-deepmind / dangerous-capability-evaluations
☆62Updated last month
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆120Updated last week
PrimeIntellect-ai / genesys
☆135Updated 8 months ago
joshuacnf / Ctrl-G
☆104Updated 10 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
MinhxLe / subliminal-learning
☆104Updated 3 months ago
open-thought / reasoning-gym-eval
Collection of LLM completions for reasoning-gym task datasets
☆30Updated 4 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 7 months ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆110Updated 11 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
Mihaiii / llm_steer
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆248Updated 9 months ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆60Updated 6 months ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆59Updated 6 months ago
teknium1 / transformers-gptq-quant
☆45Updated 2 years ago
allenai / infinigram-api
☆82Updated this week