tamuhey / tokenizationsLinks
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
☆29Updated 3 years ago
Alternatives and similar repositories for tokenizations
Users that are interested in tokenizations are comparing it to the libraries listed below
Sorting:
- ☆21Updated 3 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆56Updated 2 years ago
- EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections☆50Updated 3 years ago
- Few-shot NLP benchmark for unified, rigorous eval☆91Updated 2 years ago
- ☆19Updated 5 years ago
- Multilingual Compositional Wikidata Questions (MCWQ)☆18Updated 2 years ago
- [Work in progress] A reading list for machine commonsense reasoning☆35Updated 5 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- Baseline models for the paper: "Modeling Naive Psychology of Characters in Simple Commonsense Stories" by Hannah Rashkin, Antoine Bosselu…☆16Updated 4 years ago
- The official implementation of "Distilling Relation Embeddings from Pre-trained Language Models, EMNLP 2021 main conference", a high-qual…☆47Updated 6 months ago
- ☆21Updated 2 years ago
- 🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings☆58Updated 2 years ago
- Code for Massive-scale Decoding for Text Generation using Lattices☆44Updated 2 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆76Updated 4 years ago
- NLG and NLU for dialogue processing☆42Updated 2 years ago
- A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering☆44Updated 3 years ago
- ☆97Updated 2 years ago
- Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://a…☆46Updated 2 years ago
- Code and data for the paper: "Unsupervised Common Sense Question Answering with Self-Talk"☆78Updated 3 years ago
- CrossRE: A Cross-Domain Dataset for Relation Extraction (Findings of EMNLP 2022)☆49Updated 10 months ago
- Source code of the paper "Do Syntax Trees Help Pre-trained Transformers Extract Information?" (EACL 2021)☆75Updated 3 years ago
- Code for WikiAsp: Multi-document aspect-based summarization.☆41Updated 4 years ago
- Evaluating Machines by their Real-World Language Use☆33Updated 2 years ago
- EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation☆97Updated 2 years ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Updated 3 years ago
- ☆49Updated 2 years ago
- ☆39Updated 2 years ago
- UNISUMM: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning☆60Updated 2 years ago
- This repository contains the dataset and the pytorch implementations of the models from the paper CIDER: Commonsense Inference for Dialog…☆27Updated 2 years ago
- Code for ModularQA☆28Updated 4 years ago