Datasets collection and preprocessings framework for NLP extreme multitask learning
☆193Jul 9, 2025Updated 9 months ago
Alternatives and similar repositories for tasksource
Users that are interested in tasksource are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Easy modernBERT fine-tuning and multi-task learning☆64Mar 13, 2026Updated 3 weeks ago
- Automated Semantic Analysis of Discourse Markers☆11May 30, 2022Updated 3 years ago
- Discourse Based Evaluation of Language Understanding☆21Jan 28, 2023Updated 3 years ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)☆14May 25, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)☆22Oct 17, 2022Updated 3 years ago
- Implementation of ModernBERT in MLX☆20Jan 7, 2026Updated 3 months ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Fast whitespace correction with Transformers☆17Aug 22, 2025Updated 7 months ago
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 6 months ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Apr 26, 2023Updated 2 years ago
- Anh - LAION's multilingual assistant datasets and models☆28Apr 5, 2023Updated 3 years ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆182Feb 13, 2024Updated 2 years ago
- utilities for loading and running text embeddings with onnx☆45Aug 16, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Apr 28, 2023Updated 2 years ago
- ☆21Oct 6, 2023Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆84Feb 10, 2026Updated 2 months ago
- ☆20Nov 23, 2022Updated 3 years ago
- ☆15Oct 24, 2023Updated 2 years ago
- ☆11Nov 27, 2022Updated 3 years ago
- ☆14Apr 29, 2025Updated 11 months ago
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆38Updated this week
- Notebooks for training universal 0-shot classifiers on many different tasks☆140Dec 28, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆19Sep 16, 2025Updated 6 months ago
- Flexible, efficient, and context-aware generation from large unstructured knowledge sources.☆17May 7, 2024Updated last year
- Mining Discourse Markers for Unsupervised Sentence Representation Learning☆61May 31, 2023Updated 2 years ago
- 🥤🧑🏻🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization…☆240Jan 23, 2026Updated 2 months ago
- Multi-task modelling extensions for huggingface transformers☆21Mar 3, 2023Updated 3 years ago
- ☆15Apr 26, 2025Updated 11 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Aug 15, 2024Updated last year
- Code to create bugged python scripts for OpenAssistant Training, maintained by https://twitter.com/Cyndesama☆24Jul 23, 2023Updated 2 years ago
- Using modal.com to process FineWeb-edu data☆20Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 3 months ago
- ☆17Apr 10, 2024Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- Late Interaction Models Training & Retrieval☆783Mar 6, 2026Updated last month
- A project designed to extract relevant metadata from databases and transform it into context for Retrieval-Augmented Generation (RAG) in …☆14Aug 6, 2025Updated 8 months ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Apr 20, 2023Updated 2 years ago