haizelabs / sphynx
Sphynx Hallucination Induction
☆53Updated 2 months ago
Alternatives and similar repositories for sphynx:
Users that are interested in sphynx are comparing it to the libraries listed below
- Verdict is a library for scaling judge-time compute.☆190Updated last week
- Red-Teaming Language Models with DSPy☆175Updated last month
- ☆22Updated 5 months ago
- ☆36Updated 2 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆89Updated 9 months ago
- ⚖️ Awesome LLM Judges ⚖️☆87Updated last month
- Just a bunch of benchmark logs for different LLMs☆119Updated 8 months ago
- Track the progress of LLM context utilisation☆54Updated 8 months ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 6 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆56Updated 2 weeks ago
- Synthetic Data for LLM Fine-Tuning☆113Updated last year
- Using various instructor clients evaluating the quality and capabilities of extractions and reasoning.☆50Updated 6 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆85Updated this week
- Verbosity control for AI agents☆60Updated 10 months ago
- Letting Claude Code develop his own MCP tools :)☆91Updated 3 weeks ago
- ☆67Updated 2 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆30Updated this week
- ☆48Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated last month
- SWE Arena☆28Updated this week
- A strongly typed Python DSL for developing message passing multi agent systems☆52Updated 11 months ago
- An attribution library for LLMs☆38Updated 6 months ago
- A framework-less approach to robust agent development.☆156Updated this week
- Logging and caching superpowers for the openai sdk☆103Updated last year
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆96Updated last year
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 11 months ago
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- ☆107Updated last week