haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆116Updated 5 months ago
Related projects: ⓘ
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆77Updated 3 months ago
- Sphynx Hallucination Induction☆44Updated last month
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆49Updated 3 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- A trace analysis tool for AI agents.☆97Updated this week
- Just a bunch of benchmark logs for different LLMs☆112Updated last month
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- AWM: Agent Workflow Memory☆121Updated this week
- Mixing Language Models with Self-Verification and Meta-Verification☆96Updated 10 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems.☆48Updated 3 weeks ago
- Track the progress of LLM context utilisation☆53Updated 2 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆82Updated last month
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆72Updated last week
- Synthetic Data for LLM Fine-Tuning☆78Updated 9 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 2 months ago
- ☆91Updated last month
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]☆181Updated last month
- ☆38Updated this week
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆45Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆70Updated last month
- Evaluating LLMs with CommonGen-Lite☆83Updated 5 months ago
- ☆34Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆93Updated 5 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆102Updated 3 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆192Updated 4 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated last month
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆53Updated 2 months ago
- ReLM is a Regular Expression engine for Language Models☆100Updated last year
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago