microsoft / SafeNLP
Safety Score for Pre-Trained Language Models
☆93Updated last year
Related projects ⓘ
Alternatives and complementary repositories for SafeNLP
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 6 months ago
- A unified benchmark for math reasoning☆87Updated last year
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆149Updated 4 months ago
- Token-level Reference-free Hallucination Detection☆93Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- [TMLR'23] Contrastive Search Is What You Need For Neural Text Generation☆118Updated last year
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 7 months ago
- ☆179Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- ☆175Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆86Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 2 months ago
- An experimental implementation of the retrieval-enhanced language model☆75Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated last month
- ☆122Updated 2 months ago
- Repository for analysis and experiments in the BigCode project.☆115Updated 8 months ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆177Updated 2 years ago
- A Multilingual Replicable Instruction-Following Model☆94Updated last year
- ☆95Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆191Updated this week
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Updated last year
- Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…☆47Updated 4 months ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆85Updated 8 months ago
- ☆94Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- 🚢 Data Toolkit for Sailor Language Models☆82Updated 4 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆73Updated 3 months ago
- ☆167Updated last year