spyysalo / wiki-bert-pipelineLinks
Generate BERT vocabularies and pretraining examples from Wikipedias
☆17Updated 5 years ago
Alternatives and similar repositories for wiki-bert-pipeline
Users that are interested in wiki-bert-pipeline are comparing it to the libraries listed below
Sorting:
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Updated 3 years ago
- ☆76Updated 4 years ago
- Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer☆39Updated 4 years ago
- Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration☆19Updated 3 years ago
- ☆16Updated last year
- ML Reproducibility Challenge 2020: Electra reimplementation using PyTorch and Transformers☆12Updated 4 years ago
- 🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings☆58Updated 2 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 3 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆27Updated 3 years ago
- Statistics on multilingual datasets☆17Updated 3 years ago
- ☆47Updated 5 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆51Updated 7 months ago
- Efficient-Sentence-Embedding-using-Discrete-Cosine-Transform☆17Updated 5 years ago
- pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference☆62Updated 2 years ago
- CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training☆32Updated 2 years ago
- A web interface to understand language-specific BERT-models☆18Updated last year
- ☆13Updated 4 years ago
- Implementation of Nested Named Entity Recognition using Flair☆24Updated 3 years ago
- Code for the paper: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR 2019.☆43Updated 3 years ago
- Codebase for probing and visualizing multilingual models.☆49Updated 5 years ago
- ☆68Updated 2 months ago
- Chu-Lui-Edmonds decoding extracted from TurboParser☆14Updated 8 years ago
- Data Programming by Demonstration (DPBD) for Document Classification☆35Updated 4 years ago
- This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer …☆55Updated 4 years ago
- ☆29Updated 3 years ago
- Code for the paper "Latent Relation Language Models" at AAAI-20.☆41Updated 4 years ago
- diagNNose is a Python library that facilitates a broad set of tools for analysing hidden activations of neural models.☆82Updated last year
- ☆21Updated 2 years ago
- Geometry-aware Multilingual Embeddings☆26Updated 2 years ago