haizelabs / sphynxLinks
Sphynx Hallucination Induction
☆53Updated 5 months ago
Alternatives and similar repositories for sphynx
Users that are interested in sphynx are comparing it to the libraries listed below
Sorting:
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆92Updated 3 months ago
- Inference-time scaling for LLMs-as-a-judge.☆250Updated last week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆99Updated 2 weeks ago
- Red-Teaming Language Models with DSPy☆202Updated 5 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 11 months ago
- ☆47Updated last year
- Track the progress of LLM context utilisation☆55Updated 2 months ago
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆99Updated last year
- Synthetic Data for LLM Fine-Tuning☆119Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 8 months ago
- ☆23Updated 8 months ago
- Verbosity control for AI agents☆64Updated last year
- ☆69Updated last month
- A framework for optimizing DSPy programs with RL☆89Updated this week
- smolLM with Entropix sampler on pytorch☆150Updated 8 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆84Updated 9 months ago
- ☆64Updated last month
- ☆134Updated 3 months ago
- explore token trajectory trees on instruct and base models☆134Updated last month
- ⚖️ Awesome LLM Judges ⚖️☆107Updated 2 months ago
- ☆86Updated 6 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆71Updated 3 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆54Updated 5 months ago
- ☆128Updated 3 months ago
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 3 months ago
- Train your own SOTA deductive reasoning model☆96Updated 4 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 4 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆41Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 9 months ago