haizelabs / sphynxLinks

Sphynx Hallucination Induction

☆53

Alternatives and similar repositories for sphynx

Users that are interested in sphynx are comparing it to the libraries listed below

Sorting:

haizelabs / get-haized
A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.
☆100Updated 7 months ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆120Updated 3 weeks ago
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆19Updated last year
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆314Updated last month
haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆240Updated 9 months ago
leap-laboratories / PIZZA
An attribution library for LLMs
☆46Updated last year
haizelabs / bijection-learning
☆26Updated last year
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆93Updated 2 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
JD-P / RetroInstruct
Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.
☆32Updated last month
teknium1 / transformers-gptq-quant
☆45Updated 2 years ago
joshuacnf / Ctrl-G
☆104Updated 10 months ago
BBischof / yapping
Verbosity control for AI agents
☆64Updated last year
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆99Updated 4 months ago
brendanhogan / picoDeepResearch
☆68Updated 6 months ago
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆182Updated this week
PrimeIntellect-ai / genesys
☆136Updated 8 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆90Updated last year
google-deepmind / mishax
☆144Updated 2 months ago
invariantlabs-ai / explorer
A better way of testing, inspecting, and analyzing AI Agent traces.
☆40Updated last month
allenai / infinigram-api
☆87Updated this week
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆110Updated 11 months ago
SpellcraftAI / oaib
Use the OpenAI Batch tool to make async batch requests to the OpenAI API.
☆101Updated last year
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆134Updated 7 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 8 months ago
GoodAI / goodai-ltm-benchmark
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…
☆81Updated 11 months ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆60Updated 6 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
redotvideo / pluto
Synthetic Data for LLM Fine-Tuning
☆119Updated 2 years ago