bigscience-workshop / data_sourcingLinks
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Alternatives and similar repositories for data_sourcing
Users that are interested in data_sourcing are comparing it to the libraries listed below
Sorting:
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆32Updated 4 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 4 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 3 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 2 years ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Comprehensive NLP Evaluation System☆188Updated last year
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆72Updated 11 months ago
- ☆110Updated last year
- Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).☆8Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆87Updated 4 months ago
- A benchmark for code-switched NLP, ACL 2020☆75Updated last year
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- A comprehensive tool for linguistic analysis of communities☆49Updated 3 years ago
- Hinglish Text Classification☆30Updated 2 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆106Updated last year
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆36Updated 2 years ago
- TimeLMs: Diachronic Language Models from Twitter☆109Updated last year
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆74Updated 3 years ago
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.☆127Updated 4 years ago
- A minimal template for creating a pypi package☆49Updated 4 years ago
- Text to Speech for Indic languages☆51Updated 3 years ago
- ☆139Updated last year
- NLP tool to extract emotional phrase from tweets 🤩☆40Updated 3 years ago
- ☆17Updated 11 months ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆118Updated last year
- 💫 SpaCy wrapper for ConceptNet 💫☆94Updated last year
- Explainable Zero-Shot Topic Extraction☆63Updated 11 months ago
- A tiny BERT for low-resource monolingual models☆31Updated 10 months ago
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)☆33Updated 4 years ago