Datasets collection and preprocessings framework for NLP extreme multitask learning
☆195Jul 9, 2025Updated 9 months ago
Alternatives and similar repositories for tasksource
Users that are interested in tasksource are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Easy modernBERT fine-tuning and multi-task learning☆65Mar 13, 2026Updated last month
- Automated Semantic Analysis of Discourse Markers☆11May 30, 2022Updated 3 years ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)☆14May 25, 2023Updated 2 years ago
- Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)☆22Oct 17, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implementation of ModernBERT in MLX☆21Jan 7, 2026Updated 3 months ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Fast whitespace correction with Transformers☆17Aug 22, 2025Updated 8 months ago
- Extracts plain text, language identification and more metadata from WARC records☆23Apr 16, 2026Updated 2 weeks ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆99Apr 26, 2023Updated 3 years ago
- Anh - LAION's multilingual assistant datasets and models☆28Apr 5, 2023Updated 3 years ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆181Feb 13, 2024Updated 2 years ago
- utilities for loading and running text embeddings with onnx☆45Aug 16, 2025Updated 8 months ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Apr 28, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆21Oct 6, 2023Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆87Feb 10, 2026Updated 2 months ago
- Don't just regulate gradients like in Muon, regulate the weights too☆32Jul 30, 2025Updated 9 months ago
- ☆20Nov 23, 2022Updated 3 years ago
- ☆15Oct 24, 2023Updated 2 years ago
- ☆11Nov 27, 2022Updated 3 years ago
- ☆14Apr 29, 2025Updated last year
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆39Apr 22, 2026Updated last week
- Notebooks for training universal 0-shot classifiers on many different tasks☆140Dec 28, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆19Updated this week
- Mining Discourse Markers for Unsupervised Sentence Representation Learning☆61May 31, 2023Updated 2 years ago
- 🥤🧑🏻🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization…☆240Jan 23, 2026Updated 3 months ago
- Multi-task modelling extensions for huggingface transformers☆21Mar 3, 2023Updated 3 years ago
- ☆15Apr 26, 2025Updated last year
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Aug 15, 2024Updated last year
- Code to create bugged python scripts for OpenAssistant Training, maintained by https://twitter.com/Cyndesama☆24Jul 23, 2023Updated 2 years ago
- Using modal.com to process FineWeb-edu data☆20Apr 11, 2026Updated 3 weeks ago
- Correction of spaces with character-based neural language models.☆13Aug 23, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 4 months ago
- ☆17Apr 10, 2024Updated 2 years ago
- Late Interaction Models Training & Retrieval☆796Updated this week
- StAtutory Reasoning Assessment☆17Dec 8, 2022Updated 3 years ago
- One stop shop for all things carp☆59Sep 9, 2022Updated 3 years ago
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago