Pleias / toxic-commons
The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.
☆9Updated last week
Related projects ⓘ
Alternatives and complementary repositories for toxic-commons
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated this week
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆17Updated last month
- ☆20Updated last year
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆13Updated 9 months ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆19Updated 4 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆11Updated last year
- ☆21Updated last week
- Multilingual Open Text☆25Updated 3 weeks ago
- An easy-to-use API for analyzing INCEpTION annotation projects.☆16Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆23Updated 3 months ago
- Using short models to classify long texts☆20Updated last year
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆25Updated 2 months ago
- GlotCC Dataset and Pipline -- NeurIPS 2024☆16Updated 2 weeks ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆21Updated last month
- Source code and data for Like a Good Nearest Neighbor☆28Updated 9 months ago
- LTG-Bert☆29Updated 10 months ago
- Neural models for detecting and masking personal information from texts☆14Updated last year
- A BERT-based application for reusable text classification at scale☆37Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆23Updated 7 months ago
- Small python package to measure OCR quality and other related metrics.☆21Updated 9 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆21Updated 2 years ago
- Library for fast text representation and classification.☆28Updated 10 months ago
- A survey of corpora for Germanic low-resource languages and dialects☆24Updated 3 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 5 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated last year
- ☆12Updated 2 years ago
- Tool for parsing and converting various span encoding schemes.☆22Updated 10 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆42Updated last year
- BERT and ELECTRA models trained on Europeana Newspapers☆36Updated 2 years ago