CyberZHG / wiki-dump-reader
Extract corpora from Wikipedia dumps
☆25Updated 6 years ago
Alternatives and similar repositories for wiki-dump-reader
Users that are interested in wiki-dump-reader are comparing it to the libraries listed below
Sorting:
- ☆24Updated 5 years ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆37Updated 3 years ago
- Language Modelling Makes Sense - WSD (and more) with Contextual Embeddings☆95Updated last year
- Preprocessing scripts to read definitions and other information from dictionaries☆22Updated 7 years ago
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Updated 3 years ago
- XAI Tutorial for the Explainable AI track in the ALPS winter school 2021☆58Updated 4 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31Updated 5 years ago
- ☆32Updated 3 years ago
- Data and code for Kang et al., EMNLP 2019's paper titled "(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Ann…☆29Updated 5 years ago
- Assessing syntactic abilities of BERT☆148Updated 5 years ago
- End-to-end shallow discourse parser☆20Updated last year
- Analyzing mBERT's multilinguality in a small laboratory setting☆13Updated last year
- A program to choose transfer languages for cross-lingual learning☆72Updated 2 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- COLING 2018 Tutorial on Multilingual FrameNet: Automatic semantic role labeling for FrameNet☆25Updated 6 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆16Updated 10 months ago
- Code for the paper "Improving Robustness of Machine Translation with Synthetic Noise"☆21Updated 5 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Updated 3 years ago
- NLI test set with lexical inferences☆49Updated 6 years ago
- A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contai…☆106Updated 6 years ago
- Codebase for probing and visualizing multilingual models.☆48Updated 5 years ago
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Updated 5 years ago
- Reproduction instructions for "Rapid Adaptation of Neural Machine Translation to New Languages"☆41Updated 6 years ago
- Source code for "Improving Robustness of Neural Machine Translation with Multi-task Learning"☆19Updated 5 years ago
- ☆25Updated 3 years ago
- Perspectrum: a dataset of claims, perspectives and evidence documents☆33Updated 5 years ago
- Data and all☆14Updated 5 years ago
- PyTorch code for the EMNLP 2020 paper "Embedding Words in Non-Vector Space with Unsupervised Graph Learning"☆41Updated 4 years ago
- This is a repository for the paper on testing inductive bias with scaled-down RoBERTa models.☆20Updated 3 years ago
- Specialising Word Vectors for Lexical Entailment☆28Updated 6 years ago