bigscience-workshop / data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Alternatives and similar repositories for data_sourcing:
Users that are interested in data_sourcing are comparing it to the libraries listed below
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆87Updated 2 weeks ago
- ☆87Updated 2 years ago
- ☆23Updated last year
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 4 years ago
- ☆30Updated 3 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆31Updated 4 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- A Streamlit app to add structured tags to a dataset card☆22Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆17Updated 8 months ago
- A minimal template for creating a pypi package☆49Updated 4 years ago
- 🤗 Push your spaCy pipelines to the Hugging Face Hub☆43Updated 10 months ago
- Using short models to classify long texts☆21Updated 2 years ago
- Viewer for the 🤗 datasets library.☆84Updated 3 years ago
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- Common Voice Dataset explorer☆27Updated 2 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 2 years ago
- Visualise, evaluate, and manage annotated data☆33Updated 2 years ago
- REMERGE - Multi-Word Expression discovery algorithm☆14Updated 2 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago
- Dutch abusive language data☆11Updated last year
- A collection of my NLP projects☆19Updated 5 years ago
- ☆28Updated last year
- ☆54Updated last year
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- NLP Examples using the 🤗 libraries☆41Updated 4 years ago
- NERtwork is a collection of scripts to help you create a network graph of co-occurring named entities using open source tools. This is do…☆48Updated last year
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Updated 5 years ago