avidml / evaluating-LLMs
Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.
β23Updated 2 months ago
Alternatives and similar repositories for evaluating-LLMs
Users that are interested in evaluating-LLMs are comparing it to the libraries listed below
Sorting:
- π€ Disaggregators: Curated data labelers for in-depth analysis.β66Updated 2 years ago
- π A curated list of papers & technical articles on AI Quality & Safetyβ178Updated last month
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"β108Updated last year
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthinessβ101Updated 3 months ago
- codebase release for EMNLP2023 paper publicationβ19Updated this week
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.β315Updated 10 months ago
- Find and fix bugs in natural language machine learning models using adaptive testing.β183Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ128Updated last year
- Notebooks for training universal 0-shot classifiers on many different tasksβ125Updated 4 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ180Updated 4 months ago
- Fiddler Auditor is a tool to evaluate language models.β179Updated last year
- The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.β177Updated 3 years ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.β91Updated this week
- A framework for few-shot evaluation of autoregressive language models.β103Updated 2 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"β83Updated 9 months ago
- β207Updated 4 years ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023β68Updated last year
- Repository for research in the field of Responsible NLP at Meta.β199Updated 5 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).β109Updated last year
- Token-level Reference-free Hallucination Detectionβ94Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".β62Updated last year
- Pipeline for pulling and processing online language model pretraining data from the webβ177Updated last year
- β65Updated last year
- The Foundation Model Transparency Indexβ78Updated 11 months ago
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.β14Updated 8 months ago
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testingβ52Updated 6 months ago
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learniβ¦β29Updated last year
- Tools for managing datasets for governance and training.β85Updated 3 months ago
- β287Updated 2 weeks ago
- β72Updated last year