llm-attacks / llm-attacksLinks
Universal and Transferable Attacks on Aligned Language Models
☆4,037Updated 11 months ago
Alternatives and similar repositories for llm-attacks
Users that are interested in llm-attacks are comparing it to the libraries listed below
Sorting:
- New ways of breaking app-integrated LLMs☆1,952Updated 2 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,762Updated 3 weeks ago
- A curation of awesome tools, documents and projects about LLM Security.☆1,272Updated 2 months ago
- ☆570Updated last week
- A unified evaluation framework for large language models☆2,660Updated last week
- Robust recipes to align language models with human and AI preferences☆5,260Updated this week
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,737Updated 11 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,565Updated last month
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆676Updated 10 months ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,800Updated 6 months ago
- Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI☆2,041Updated 11 months ago
- ☆2,529Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,392Updated last year
- The Official Python Client for Lamini's API☆2,538Updated 3 months ago
- ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting wit…☆1,077Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆393Updated last year
- ☆4,083Updated last year
- Alpaca dataset from Stanford, cleaned and curated☆1,560Updated 2 years ago
- LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transform…☆1,459Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,674Updated last year
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,335Updated this week
- ☆1,024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆848Updated 11 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,447Updated 2 years ago
- NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.☆4,872Updated this week
- Tools for merging pretrained large language models.☆6,016Updated 3 weeks ago
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,021Updated 2 years ago
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,378Updated 3 weeks ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆2,482Updated 11 months ago
- ☆1,476Updated 2 years ago