bigscience-workshop / data_sourcingLinks
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Alternatives and similar repositories for data_sourcing
Users that are interested in data_sourcing are comparing it to the libraries listed below
Sorting:
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 3 years ago
- ☆87Updated 3 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆87Updated 3 months ago
- ☆28Updated 2 years ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆39Updated 2 years ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆71Updated 10 months ago
- A comprehensive tool for linguistic analysis of communities☆49Updated 3 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- ☆30Updated 3 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆36Updated 3 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆32Updated 4 years ago
- Simple Python client for the Hugging Face Inference API☆73Updated 4 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- Dutch abusive language data☆11Updated last year
- A Streamlit app to add structured tags to a dataset card☆22Updated 3 years ago
- XAI based human-in-the-loop framework for automatic rule-learning.☆49Updated last year
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 3 years ago
- ☆22Updated 3 years ago
- Sequence models in Numpy☆25Updated 4 years ago
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆36Updated 2 years ago
- NLP tool to extract emotional phrase from tweets 🤩☆40Updated 3 years ago
- TPU use in single line in colab using tf2 package.☆11Updated 3 years ago
- ☆34Updated 5 years ago
- Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).☆20Updated 3 years ago
- ☆31Updated 2 years ago