Datasets collection and preprocessings framework for NLP extreme multitask learning
☆193Jul 9, 2025Updated 7 months ago
Alternatives and similar repositories for tasksource
Users that are interested in tasksource are comparing it to the libraries listed below
Sorting:
- Easy modernBERT fine-tuning and multi-task learning☆64Jul 2, 2025Updated 8 months ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 5 months ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Apr 26, 2023Updated 2 years ago
- Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)☆22Oct 17, 2022Updated 3 years ago
- ☆11Nov 27, 2022Updated 3 years ago
- ☆21Oct 6, 2023Updated 2 years ago
- GPT as Knowledger Worker (or if you really want, GPT Sorta' Takes the CPA Exam)☆13Jan 24, 2023Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Nov 27, 2023Updated 2 years ago
- utilities for loading and running text embeddings with onnx☆45Aug 16, 2025Updated 6 months ago
- Anh - LAION's multilingual assistant datasets and models☆27Apr 5, 2023Updated 2 years ago
- ☆19Sep 16, 2025Updated 5 months ago
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆38Feb 12, 2026Updated 2 weeks ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- Implementation of ModernBERT in MLX☆20Jan 7, 2026Updated last month
- ☆17Apr 10, 2024Updated last year
- Fast whitespace correction with Transformers☆17Aug 22, 2025Updated 6 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆78Feb 10, 2026Updated 2 weeks ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Feb 13, 2024Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 2 months ago
- Beyond LM: How can language model go forward in the future?☆15Apr 30, 2023Updated 2 years ago
- Leaderboards are widely used in NLP and push the field forward. While leaderboards are a straightforward ranking of NLP models, this simp…☆18Mar 30, 2022Updated 3 years ago
- Using modal.com to process FineWeb-edu data☆20Apr 5, 2025Updated 10 months ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- A simple Flask app that lets you text back and forth with Open Interpreter. Probably a bad idea.☆22Oct 7, 2023Updated 2 years ago
- Customizable implementation of the self-instruct paper.☆1,049Mar 7, 2024Updated last year
- ☆20Nov 23, 2022Updated 3 years ago
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- Easily deploy your rwkv model☆19May 5, 2023Updated 2 years ago
- Late Interaction Models Training & Retrieval☆732Updated this week
- An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi☆273Apr 15, 2023Updated 2 years ago
- Training code for Sparse Autoencoders on Embedding models☆39Feb 27, 2025Updated last year
- AskUp Search ChatGPT Plugin☆20May 27, 2023Updated 2 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- Code to create bugged python scripts for OpenAssistant Training, maintained by https://twitter.com/Cyndesama☆24Jul 23, 2023Updated 2 years ago
- Template Filling with Generative Transformers☆23Jun 8, 2021Updated 4 years ago
- awesome synthetic (text) datasets☆325Jan 8, 2026Updated last month
- ☆565Nov 20, 2024Updated last year
- A CLI in Rust to generate synthetic data for MLX friendly training☆25Jan 13, 2024Updated 2 years ago