bigscience-workshop / data_sourcingLinks
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 4 years ago
Alternatives and similar repositories for data_sourcing
Users that are interested in data_sourcing are comparing it to the libraries listed below
Sorting:
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 3 years ago
- A minimal template for creating a pypi package☆49Updated 5 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 3 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆35Updated 3 years ago
- 🛠️ Tools for Transformers compression using PyTorch Lightning ⚡☆85Updated this week
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.☆127Updated 5 years ago
- ☆117Updated 3 months ago
- TimeLMs: Diachronic Language Models from Twitter☆112Updated last year
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆30Updated 4 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 3 years ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆120Updated 3 months ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆32Updated 4 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- 💫 SpaCy wrapper for ConceptNet 💫☆95Updated last month
- Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters …☆76Updated 3 weeks ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆113Updated last year
- ☆87Updated 3 years ago
- A monolingual and cross-lingual meta-embedding generation and evaluation framework☆79Updated 3 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale☆157Updated 2 years ago
- Explainable Zero-Shot Topic Extraction☆65Updated last year
- Comprehensive NLP Evaluation System☆188Updated last year
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 3 years ago
- Few-shot Named Entity Recognition☆121Updated 3 years ago
- Tools for managing datasets for governance and training.☆87Updated 2 weeks ago
- A Python library aimed at dissecting and augmenting NER training data.☆60Updated 2 years ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Updated 6 years ago
- Viewer for the 🤗 datasets library.☆86Updated 4 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- Some notebooks for NLP☆207Updated 2 years ago