bigscience-workshop / data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Alternatives and similar repositories for data_sourcing:
Users that are interested in data_sourcing are comparing it to the libraries listed below
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆86Updated last week
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆58Updated 2 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago
- Using short models to classify long texts☆21Updated last year
- ☆22Updated 2 years ago
- Hinglish Text Classification☆30Updated last year
- Common Voice Dataset explorer☆27Updated 2 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).☆7Updated last year
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆31Updated 2 years ago
- ☆30Updated 3 years ago
- A comprehensive tool for linguistic analysis of communities☆48Updated 3 years ago
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆36Updated last year
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆70Updated 4 months ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆35Updated last year
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated 10 months ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Code for the paper: Saying No is An Art: Contextualized Fallback Responses for Unanswerable Dialogue Queries☆19Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 7 months ago
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)☆33Updated 4 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 10 months ago
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Dutch abusive language data☆11Updated last year
- No Teacher BART distillation experiment for NLI tasks☆26Updated 4 years ago
- Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text …☆22Updated 11 months ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆66Updated 2 years ago