tamuhey / tokenizationsLinks
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
☆29Updated 4 years ago
Alternatives and similar repositories for tokenizations
Users that are interested in tokenizations are comparing it to the libraries listed below
Sorting:
- The official implementation of "Distilling Relation Embeddings from Pre-trained Language Models, EMNLP 2021 main conference", a high-qual…☆46Updated last year
- ☆97Updated 3 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆58Updated 3 years ago
- EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections☆52Updated 4 years ago
- Code for Massive-scale Decoding for Text Generation using Lattices☆44Updated 3 years ago
- ☆31Updated 2 years ago
- 🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings☆58Updated 3 years ago
- Few-shot NLP benchmark for unified, rigorous eval☆93Updated 3 years ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated 3 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆175Updated 4 years ago
- CrossRE: A Cross-Domain Dataset for Relation Extraction (Findings of EMNLP 2022)☆49Updated last year
- Repository for the Question Answering via Sentence Composition (QASC) dataset☆56Updated 2 years ago
- Code for WikiAsp: Multi-document aspect-based summarization.☆43Updated 5 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Updated 5 years ago
- Code and data for the paper: "Unsupervised Common Sense Question Answering with Self-Talk"☆79Updated 4 years ago
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 3 years ago
- This repository contains the code for "How many data points is a prompt worth?"☆48Updated 4 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Updated 3 years ago
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆80Updated 2 years ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Updated 3 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆76Updated 5 years ago
- ☆49Updated 2 years ago
- ☆49Updated 2 years ago
- Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles☆48Updated last year
- Neural models of common sense. 🤖☆98Updated 2 years ago
- A categorical archive of ChatGPT failures☆64Updated 2 years ago
- PyTorch original implementation of "Unsupervised Question Decomposition for Question Answering"☆122Updated 2 years ago
- Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)☆11Updated 3 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Updated 3 years ago
- FEVER (Fact Extraction and VERification) Annotation Platform and Baselines☆119Updated last year