levyfan / sentencepiece-jniLinks
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
☆38Updated 2 years ago
Alternatives and similar repositories for sentencepiece-jni
Users that are interested in sentencepiece-jni are comparing it to the libraries listed below
Sorting:
- Java port of c++ version of facebook fasttext☆15Updated 6 years ago
- A collection of resources on using BERT (https://arxiv.org/abs/1810.04805 ) and related Language Models in production environments.☆96Updated 4 years ago
- Fork of huggingface/pytorch-pretrained-BERT for BERT on STILTs☆106Updated 2 years ago
- This repo includes extensions to the Stanford Dialogue Corpus. It contains crowd-sourced rewrites to facilitate research in dialogue stat…☆91Updated 6 years ago
- Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018☆123Updated last month
- Byte Pair Encoding for Python!☆231Updated 3 years ago
- This repository contains the FewGLUE dataset for few-shot natural language understanding.☆160Updated 5 years ago
- Dockerized NMT frameworks for nmt-wizard☆39Updated 2 years ago
- Symphony Machine Translation☆38Updated 5 years ago
- Word Piece Model python light version with functions tokenize/save/load☆64Updated 5 years ago
- Phrase-Indexed Question Answering (PIQA)☆94Updated 6 years ago
- Subword Language Model for Query Auto-Completion☆67Updated 6 years ago
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)☆201Updated 2 years ago
- Pytorch Implementation of ALBERT(A Lite BERT for Self-supervised Learning of Language Representations)☆227Updated 4 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Updated 4 years ago
- A novel method of constrained decoding for neural NLG (NNLG) models☆84Updated 5 years ago
- Corpus preprocessing☆99Updated last year
- eXtensible Neural Machine Translation☆185Updated last month
- PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset☆123Updated 6 years ago
- This repository contains code to replicate the no-longer publicly available Toronto BookCorpus dataset☆49Updated 3 years ago
- A Corpus for Multilingual Document Classification in Eight Languages.☆152Updated 3 years ago
- Neural Text Generation with Unlikelihood Training☆310Updated 4 years ago
- ICLR 2018 Quick-Thought vectors☆204Updated 6 years ago
- Resources for the OpenNMT hackathon☆51Updated 6 years ago
- Code for "Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models" and "Learning Contextualised Cros…☆31Updated 3 years ago
- dstc7-noesis☆45Updated 6 years ago
- ☆324Updated 2 years ago
- Embedding Quantization (Compress Word Embeddings)☆85Updated 6 years ago
- Neural macine translation soft alignment visualisations for web and command line☆72Updated 4 years ago
- A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)☆176Updated 2 years ago