bigscience-workshop / data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Alternatives and similar repositories for data_sourcing:
Users that are interested in data_sourcing are comparing it to the libraries listed below
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆58Updated 2 years ago
- Common Voice Dataset explorer☆27Updated 2 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆86Updated last month
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- ☆23Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆31Updated 3 years ago
- ☆87Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆117Updated 10 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 8 months ago
- ☆30Updated 3 years ago
- A monolingual and cross-lingual meta-embedding generation and evaluation framework☆80Updated 2 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- A comprehensive tool for linguistic analysis of communities☆49Updated 3 years ago
- A Streamlit component for annotating text by text selecting.☆40Updated 8 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆67Updated 2 years ago
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆36Updated last year
- NLP Examples using the 🤗 libraries☆41Updated 3 years ago
- Explainable Zero-Shot Topic Extraction☆62Updated 6 months ago
- ☆28Updated last year
- ☆18Updated last year
- Bag of, not words, but tricks!☆68Updated last year
- A minimal template for creating a pypi package☆49Updated 4 years ago
- 🤗 Push your spaCy pipelines to the Hugging Face Hub☆44Updated 8 months ago