facebookresearch / side
The AI Knowledge Editor
☆182Updated 2 years ago
Alternatives and similar repositories for side
Users that are interested in side are comparing it to the libraries listed below
Sorting:
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated last year
- ☆182Updated last year
- ☆93Updated 4 months ago
- Pretraining Efficiently on S2ORC!☆163Updated 6 months ago
- Used for adaptive human in the loop evaluation of language and embedding models.☆309Updated 2 years ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆288Updated 7 months ago
- ☆209Updated 2 months ago
- Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)☆463Updated 2 years ago
- The pipeline for the OSCAR corpus☆168Updated last year
- Evaluation suite for large-scale language models.☆125Updated 3 years ago
- Web-scale retrieval for knowledge-intensive NLP☆553Updated 2 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆101Updated last year
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆157Updated 2 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆180Updated 4 months ago
- Question-answers, collected from Google☆129Updated 3 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆202Updated 3 years ago
- Repository containing code for "How to Train BERT with an Academic Budget" paper☆313Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆69Updated 2 years ago
- Tools for managing datasets for governance and training.☆85Updated 3 months ago
- A library for finding knowledge neurons in pretrained transformer models.☆157Updated 3 years ago
- ☆97Updated 2 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated 9 months ago
- ☆72Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆94Updated 2 years ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆207Updated 4 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated last year
- SILO Language Models code repository☆81Updated last year