chziakas / redeval
A library for red-teaming LLM applications with LLMs.
☆22Updated last month
Related projects ⓘ
Alternatives and complementary repositories for redeval
- Red-Teaming Language Models with DSPy☆142Updated 7 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆32Updated 2 months ago
- Sphynx Hallucination Induction☆48Updated 3 months ago
- ☆37Updated 3 weeks ago
- Track the progress of LLM context utilisation☆53Updated 4 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆86Updated 5 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- ☆16Updated 5 months ago
- ☆128Updated this week
- ☆106Updated 2 months ago
- Framework for LLM evaluation, guardrails and security☆96Updated 2 months ago
- LLM Evals for Text Summarization and RAG use-cases.☆35Updated 10 months ago
- Realign is an evaluation and experimentation framework for AI applications.☆12Updated 3 weeks ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 3 months ago
- ☆41Updated 2 weeks ago
- Synthetic Data for LLM Fine-Tuning☆97Updated 11 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆43Updated 9 months ago
- ☆63Updated this week
- Transform unstructured documents into actionable, structured data with enterprise-grade precision and reliability, ready for large-scale …☆11Updated last week
- Open Implementations of LLM Analyses☆94Updated last month
- ☆62Updated last month
- ☆24Updated last year
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated last month
- ☆42Updated 4 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆31Updated last week
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆57Updated 9 months ago