bigscience-workshop / data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group
☆31Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for data_sourcing
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- Using short models to classify long texts☆20Updated last year
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 9 months ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- ☆29Updated last year
- NLP Examples using the 🤗 libraries☆42Updated 3 years ago
- Scripts for pushing models to huggingface repos☆11Updated 3 weeks ago
- Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).☆5Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated last month
- A minimal template for creating a pypi package☆49Updated 3 years ago
- A Streamlit app to add structured tags to a dataset card☆22Updated 2 years ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆69Updated 3 months ago
- ☆30Updated 3 years ago
- spaCy match and replace, maintaining conjugation☆34Updated last year
- ☆23Updated last year
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)☆33Updated 4 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆27Updated 3 years ago
- ☆86Updated 2 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Code for the paper: Saying No is An Art: Contextualized Fallback Responses for Unanswerable Dialogue Queries☆19Updated 2 years ago
- A starter kit for evaluating benchmarks on the 🤗 Hub☆13Updated 10 months ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- A comprehensive tool for linguistic analysis of communities☆48Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 5 months ago
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago