jogonba2 / TWilBert
Specialization of BERT architecture both for the Spanish language and the Twitter domain
☆13Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for TWilBert
- Code to process Word Usage Graphs☆10Updated last month
- The Spanish Fake News Corpus contains a collection of 971 news divided into 491 real news and 480 fake news. The corpus covers news from …☆37Updated 3 years ago
- German sentiment scores with SentiWS as extension for spaCy☆36Updated 2 years ago
- Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).☆253Updated last year
- BERT and ELECTRA models trained on Europeana Newspapers☆36Updated 2 years ago
- ☆61Updated last year
- CONLL-U to Pandas DataFrame☆31Updated 7 years ago
- An easy-to-use library to extract indices from texts.☆29Updated 3 years ago
- An annotated corpus of argumentative microtexts☆39Updated 2 years ago
- ☆12Updated 3 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Code for the paper "Content Analysis of Textbooks via Natural Language Processing".☆56Updated last year
- Featurize words into orthographic and phonological vectors.☆40Updated last year
- Unannotated Spanish 3 Billion Words Corpora☆92Updated 2 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆76Updated 4 months ago
- A pre-trained language model for social media text in Spanish☆34Updated last year
- Spanish Billion Word Corpus and Embeddings☆45Updated last year
- Dutch coreference resolution & dialogue analysis using deterministic rules☆21Updated last year
- Data for the HIPE 2022 shared task.☆16Updated 11 months ago
- A python package to enrich Twitter Data☆74Updated last year
- Ready to use Spanish Word2Vec embeddings created from >18B chars and >3B words☆41Updated 5 years ago
- A french sequence to sequence pretrained model☆57Updated 2 years ago
- Ten Thousand German News Articles Dataset for Topic Classification☆84Updated 2 years ago
- ☆22Updated last year
- linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).☆50Updated last year
- A Large Automatically-Constructed Resource of Predicate Paraphrases☆43Updated 4 years ago
- 🇧🇪 BelGPT-2: the 1st GPT model pretrained in French.☆33Updated 3 years ago
- SImple SenTence EmbeddeR☆73Updated last year
- ☆44Updated 2 years ago
- A web interface to understand language-specific BERT-models☆17Updated 7 months ago