cisco-open / polygraphLLMLinks
A library for detecting hallucination and improving LLM factuality
☆18Updated 4 months ago
Alternatives and similar repositories for polygraphLLM
Users that are interested in polygraphLLM are comparing it to the libraries listed below
Sorting:
- RAI is a python library that is written to help AI developers in various aspects of responsible AI development.☆63Updated last year
- The Granite Guardian models are designed to detect risks in prompts and responses.☆130Updated 3 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆62Updated last month
- ☆43Updated last year
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆174Updated this week
- This is the repository for composable NLP Inference Pipeline tool Blaze☆31Updated last year
- A toolkit for optimizing machine learning models for practical applications☆31Updated 11 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆117Updated this week
- A method for steering llms to better follow instructions☆78Updated 6 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 11 months ago
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆230Updated 2 weeks ago
- ☆152Updated 5 months ago
- ☆50Updated last year
- Language Model for Mainframe Modernization☆66Updated last year
- RuLES: a benchmark for evaluating rule-following in language models☆248Updated 11 months ago
- ☆217Updated last week
- Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)☆39Updated 2 weeks ago
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- ☆236Updated 3 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆315Updated last year
- Find and fix bugs in natural language machine learning models using adaptive testing.☆188Updated last year
- A Python library for guardrail models evaluation.☆30Updated 3 months ago
- EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other la…☆93Updated 2 months ago
- Fiddler Auditor is a tool to evaluate language models.☆188Updated last year
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…☆418Updated 2 weeks ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆509Updated 2 weeks ago
- A Text-Based Environment for Interactive Debugging☆293Updated this week
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆56Updated this week
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆542Updated this week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆287Updated this week