avidml / evaluating-LLMsLinks

Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.

☆24

Alternatives and similar repositories for evaluating-LLMs

Users that are interested in evaluating-LLMs are comparing it to the libraries listed below

Sorting:

Giskard-AI / awesome-ai-safety
📚 A curated list of papers & technical articles on AI Quality & Safety
☆184Updated 2 months ago
huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆66Updated 2 years ago
microsoft / adaptive-testing
Find and fix bugs in natural language machine learning models using adaptive testing.
☆183Updated last year
pmbaumgartner / setfit
☆43Updated 2 years ago
huggingface / data-measurements-tool
Developing tools to automatically analyze datasets
☆73Updated 7 months ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆178Updated last year
MantisAI / hugie
Command Line Interface for Hugging Face Inference Endpoints
☆66Updated last year
credo-ai / credoai_lens
Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central …
☆47Updated last year
openai / moderation-api-release
☆133Updated 2 years ago
cisnlp / GlotLID
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
☆138Updated 3 weeks ago
nateraw / huggingface-datasets-converter
Scripts to convert datasets from various sources to Hugging Face Datasets.
☆57Updated 2 years ago
stanford-crfm / fmti
The Foundation Model Transparency Index
☆81Updated last year
g8a9 / ferret
A python package for benchmarking interpretability techniques on Transformers.
☆213Updated 8 months ago
ccdv-ai / convert_checkpoint_to_lsg
Efficient Attention for Long Sequence Processing
☆94Updated last year
NorskRegnesentral / text-anonymization-benchmark
Annotated corpus + evaluation metrics for text anonymisation
☆57Updated last year
bigscience-workshop / data_tooling
Tools for managing datasets for governance and training.
☆85Updated last month
mega002 / lm-debugger
The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.
☆177Updated 3 years ago
UKPLab / eacl2024-lagonn
Source code and data for Like a Good Nearest Neighbor
☆29Updated 5 months ago
argilla-io / adept-augmentations
A Python library aimed at dissecting and augmenting NER training data.
☆58Updated 2 years ago
JulesBelveze / bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
☆83Updated 7 months ago
sileod / tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
☆184Updated 5 months ago
salesforce / AuditNLG
AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness
☆100Updated 5 months ago
infinitylogesh / mutate
A library to synthesize text datasets using Large Language Models (LLM)
☆152Updated 2 years ago
IBM / ensemble-instruct
codebase release for EMNLP2023 paper publication
☆19Updated last month
huggingface / that_is_good_data
☆66Updated last year
flairNLP / fabricator
[EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.
☆109Updated last year
bigscience-workshop / lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
☆105Updated 2 years ago
thakur-nandan / sprint
SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.
☆45Updated last year
JoaoLages / RATransformers
RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!
☆41Updated 2 years ago
IBM / fastfit
FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes
☆207Updated last month