bigscience-workshop / data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group
β31Updated 3 years ago
Alternatives and similar repositories for data_sourcing
Users that are interested in data_sourcing are comparing it to the libraries listed below
Sorting:
- Scripts to convert datasets from various sources to Hugging Face Datasets.β57Updated 2 years ago
- Execute arbitrary SQL queries on π€ Datasetsβ32Updated last year
- β87Updated 2 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downsβ¦β32Updated 4 years ago
- spaCy match and replace, maintaining conjugationβ35Updated 2 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.β28Updated 3 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER modelsβ33Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlowβ14Updated 2 years ago
- A minimal template for creating a pypi packageβ49Updated 4 years ago
- π€ Push your spaCy pipelines to the Hugging Face Hubβ44Updated 11 months ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.β87Updated last month
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.β36Updated last year
- β28Updated 2 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR modelsβ31Updated 4 years ago
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)β33Updated 4 years ago
- NLP Examples using the π€ librariesβ41Updated 4 years ago
- A Streamlit app to add structured tags to a dataset cardβ22Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP modelsβ¦β36Updated 3 years ago
- Bag of, not words, but tricks!β68Updated last year
- Explainable Zero-Shot Topic Extractionβ62Updated 8 months ago
- NLP tool to extract emotional phrase from tweets π€©β40Updated 3 years ago
- Using short models to classify long textsβ21Updated 2 years ago
- Just another sentiment wrapper.β17Updated 3 years ago
- A comprehensive tool for linguistic analysis of communitiesβ49Updated 3 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub π€β‘οΈβ36Updated 3 years ago
- A Python library aimed at dissecting and augmenting NER training data.β58Updated 2 years ago
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.β126Updated 4 years ago
- β30Updated 3 years ago
- Common Voice Dataset explorerβ27Updated 2 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality β¦β106Updated last year