krangelie / bias-in-german-nlg
Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.
☆16Updated 6 months ago
Alternatives and similar repositories for bias-in-german-nlg:
Users that are interested in bias-in-german-nlg are comparing it to the libraries listed below
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆16Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 4 months ago
- ☆24Updated 3 months ago
- ☆21Updated 2 months ago
- ☆28Updated last year
- Evaluate language models using multiple choice items☆13Updated last month
- ☆12Updated 6 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆44Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 3 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆97Updated 2 years ago
- Tools for managing datasets for governance and training.☆83Updated last month
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆26Updated 11 months ago
- German Alpaca Dataset (Cleaned + Translated)☆24Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- ☆51Updated last year
- ☆26Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆47Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆19Updated last month
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- ☆14Updated 5 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 8 months ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 2 months ago
- ☆45Updated 3 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆84Updated 3 weeks ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆23Updated 2 weeks ago