GC4LM: A Colossal (Biased) language model for German
☆13May 2, 2021Updated 5 years ago
Alternatives and similar repositories for gc4lm
Users that are interested in gc4lm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Dec 11, 2020Updated 5 years ago
- texrex web page cleaning & ClaraX random walk crawler☆11Dec 13, 2021Updated 4 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆39Dec 14, 2021Updated 4 years ago
- Neural models for detecting and masking personal information from texts☆16Nov 25, 2022Updated 3 years ago
- Norwegian Speech Transformer Models☆19Mar 26, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Implementation of the GLOM model for text☆11Mar 4, 2021Updated 5 years ago
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Sep 2, 2022Updated 3 years ago
- ☆13Feb 12, 2023Updated 3 years ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction☆13Apr 21, 2020Updated 6 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Feb 16, 2026Updated 3 months ago
- Training data for the NLPContributionGraph Shared Task 11 at SemEval-2021☆14Jan 11, 2021Updated 5 years ago
- Elastic support for Bokmål/Nynorsk☆32Mar 30, 2017Updated 9 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- RDF river plugin for harvesting metadata from Jena TDB, SPARQL endpoints or plain RDF files into Elasticsearch☆10May 20, 2022Updated 4 years ago
- Check your modified Ground Truth files with visual support!☆10Jan 31, 2024Updated 2 years ago
- Cloud and Kubernetes configuration for deployment for wbstack.com. You'll want to look at the wikibase.cloud deploy repository soon!☆12Feb 9, 2024Updated 2 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Mar 8, 2022Updated 4 years ago
- Generative tree visualiser for Python☆16Sep 15, 2020Updated 5 years ago
- This is a prototype of a semi-automatic data anonymization app for German documents. ➡️ The project has moved to: https://gitlab.opencode…☆24Mar 20, 2026Updated 2 months ago
- A software for transferring pre-trained English models to foreign languages☆19Mar 20, 2023Updated 3 years ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ALTO XML schema - latest and all former versions☆55Jan 20, 2026Updated 4 months ago
- Discontinuous Data-Oriented Parsing☆46Jan 5, 2024Updated 2 years ago
- Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"☆16May 31, 2019Updated 6 years ago
- NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)☆18Oct 31, 2025Updated 6 months ago
- ☆12Jun 10, 2021Updated 4 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆20May 13, 2026Updated 2 weeks ago
- MediaWiki extension that adds support for local media files to Wikibase via a new data type.☆12Mar 26, 2026Updated 2 months ago
- Dense Passage Retrieval using tensorflow-keras on TPU☆17Jun 27, 2021Updated 4 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆152Dec 9, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- German Alpaca Dataset (Cleaned + Translated)☆26Apr 6, 2023Updated 3 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆86Oct 6, 2022Updated 3 years ago
- Analyse des Pegida facebook Korpus☆10Jan 31, 2015Updated 11 years ago
- Named Entity Recognition☆19Feb 13, 2026Updated 3 months ago
- Named Entity Disambiguation and Linking☆16May 24, 2024Updated 2 years ago
- Professor forcing future code☆10Sep 22, 2018Updated 7 years ago
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆18Jun 24, 2024Updated last year