stefan-it / gc4lmView external linksLinks
GC4LM: A Colossal (Biased) language model for German
☆13May 2, 2021Updated 4 years ago
Alternatives and similar repositories for gc4lm
Users that are interested in gc4lm are comparing it to the libraries listed below
Sorting:
- Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Dec 11, 2020Updated 5 years ago
- texrex web page cleaning & ClaraX random walk crawler☆11Dec 13, 2021Updated 4 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 4 years ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- ☆13Feb 12, 2023Updated 3 years ago
- ☆13Apr 16, 2021Updated 4 years ago
- Training data for the NLPContributionGraph Shared Task 11 at SemEval-2021☆14Jan 11, 2021Updated 5 years ago
- ☆12Jun 10, 2021Updated 4 years ago
- Neural models for detecting and masking personal information from texts☆16Nov 25, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Dec 14, 2021Updated 4 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Mar 8, 2022Updated 3 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- This is a prototype of a semi-automatic data anonymization app for German documents.☆23Mar 6, 2023Updated 2 years ago
- Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"☆16May 31, 2019Updated 6 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Updated this week
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆21Aug 1, 2024Updated last year
- The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"☆21Nov 10, 2020Updated 5 years ago
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Sep 2, 2022Updated 3 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Apr 25, 2024Updated last year
- Fine-tuned Transformers compatible BERT models for Sequence Tagging☆40Jul 17, 2020Updated 5 years ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆24Oct 13, 2025Updated 4 months ago
- A CRF-biLSTM based Biomedical NER model in Bioinformatics 2018.☆24Jul 31, 2018Updated 7 years ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆157Dec 6, 2022Updated 3 years ago
- Code for ModularQA☆28Jun 8, 2021Updated 4 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- Code for the paper "A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling" (ACL2018)☆29Nov 6, 2019Updated 6 years ago
- ☆28Jul 17, 2019Updated 6 years ago
- Python library for converting between BioNLP formats☆22Apr 20, 2023Updated 2 years ago
- LTG-Bert☆34Jan 8, 2024Updated 2 years ago
- ☆30Jun 11, 2021Updated 4 years ago
- ☆31May 26, 2021Updated 4 years ago
- Elastic support for Bokmål/Nynorsk☆32Mar 30, 2017Updated 8 years ago
- ☆17Feb 7, 2026Updated last week
- Discontinuous Data-Oriented Parsing☆46Jan 5, 2024Updated 2 years ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Jan 9, 2025Updated last year
- ☆34Sep 7, 2023Updated 2 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆87Oct 6, 2022Updated 3 years ago
- ☆37Nov 16, 2017Updated 8 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆151Dec 9, 2024Updated last year