transducens / linguacrawl
Crawling engine that crawls a set of top-level domains looking for documents in a list of languages
☆11Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for linguacrawl
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- ☆22Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Updated 3 years ago
- A web interface to understand language-specific BERT-models☆17Updated 7 months ago
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆34Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18Updated 3 years ago
- Converter from UD-trees to BART representation☆36Updated 8 months ago
- Efficient Sentence Embedding via Semantic Subspace Analysis☆14Updated 4 years ago
- A embed able annotation tool for end to end cross document co-reference☆41Updated last year
- ☆17Updated last year
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated last month
- A set of methods for finding an appropriate number of topics in a text collection☆14Updated 3 months ago
- Multilingual Open Text☆25Updated 3 weeks ago
- Implementation of Nested Named Entity Recognition using Flair☆24Updated 3 years ago
- Combining encoder-based language models☆11Updated 3 years ago
- ☆73Updated 3 years ago
- Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statement…☆16Updated 3 years ago
- LAReQA is a challenging benchmark for evaluating language agnostic answer retrieval from a multilingual candidate pool. This repository c…☆14Updated 4 years ago
- numeric fused-head identification and resolution☆33Updated 5 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31Updated 4 years ago
- Efficient-Sentence-Embedding-using-Discrete-Cosine-Transform☆17Updated 4 years ago
- ☆16Updated last year