lechmazur / deception
Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.
☆14Updated last month
Related projects ⓘ
Alternatives and complementary repositories for deception
- ☆40Updated 6 months ago
- Modified Beam Search with periodical restart☆12Updated 2 months ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG☆49Updated 2 weeks ago
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆23Updated last year
- Example implementation of Iteration of Tought - Gives a star if you like the project☆33Updated last week
- Evolutionary Search for expert-level performance on any task with environmental feedback☆14Updated 9 months ago
- ☆37Updated this week
- Grok by X (Twitter) System Prompt Leak☆23Updated 11 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆33Updated 2 months ago
- Genetics for Language Models☆12Updated 4 months ago
- Routing on Random Forest (RoRF)☆84Updated 2 months ago
- ☆36Updated 3 weeks ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆57Updated 4 months ago
- An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, …☆22Updated 3 months ago
- Running Microsoft's BitNet via Electron, React & Astro☆14Updated 2 weeks ago
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆44Updated last year
- Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon☆15Updated 2 weeks ago
- ☆39Updated 11 months ago
- Get a markdown version of any webpage with a keyboard shortcut.☆37Updated this week
- Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning☆21Updated 2 weeks ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆56Updated 3 weeks ago
- ☆12Updated 2 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Updated 10 months ago
- The next evolution of Agents☆46Updated 2 weeks ago
- OpenAI GPT hosted Agent Framework for Windows and MacOS☆36Updated 4 months ago
- PII Masker is an open-source tool for protecting sensitive data by automatically detecting and masking PII using advanced AI, powered by …☆42Updated this week
- ☆12Updated 4 months ago
- Using modal.com to process FineWeb-edu data☆19Updated 2 months ago