GermanT5 / wikipedia2corpusView external linksLinks
Wikipedia text corpus for self-supervised NLP model training
☆46Jul 17, 2022Updated 3 years ago
Alternatives and similar repositories for wikipedia2corpus
Users that are interested in wikipedia2corpus are comparing it to the libraries listed below
Sorting:
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 3 years ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago
- This repository contains code for the paper "Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs" (Wang, Lawrence…☆17Mar 8, 2021Updated 4 years ago
- A dataset for realistic evaluation of noisy label methods☆14Dec 3, 2023Updated 2 years ago
- Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"☆15Feb 21, 2024Updated last year
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- A data set and model for german sentiment classification.☆68May 30, 2025Updated 8 months ago
- Klexikon: A German Dataset for Joint Summarization and Simplification☆17Oct 5, 2022Updated 3 years ago
- Tools relating to the CC-News-En Collection☆20Dec 8, 2023Updated 2 years ago
- German Alpaca Dataset (Cleaned + Translated)☆26Apr 6, 2023Updated 2 years ago
- Repo for the simplified text alignment tools.☆21Dec 4, 2020Updated 5 years ago
- ☆17Feb 1, 2023Updated 3 years ago
- DWIE (Deutsche Welle corpus for Information Extraction) dataset. Introduced in our "DWIE: an entity-centric dataset for multi-task docume…☆51Jul 23, 2023Updated 2 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆151Dec 9, 2024Updated last year
- ☆24Jun 12, 2023Updated 2 years ago
- Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization☆28May 23, 2022Updated 3 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆34Jul 7, 2022Updated 3 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- Source code of the paper "Do Syntax Trees Help Pre-trained Transformers Extract Information?" (EACL 2021)☆75Dec 29, 2021Updated 4 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- Comprehensive evaluation framework for Open Information Extraction.☆40Jun 21, 2022Updated 3 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆87Nov 1, 2025Updated 3 months ago
- ☆40May 4, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Code and data for the Walert large language model-based chatbot☆12Aug 14, 2025Updated 6 months ago
- Nuod (Numerical Odin) is an Odin library for creating and manipulating numerical multi-dimensional arrays.☆19Nov 15, 2025Updated 2 months ago
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- A library for computing diverse text characteristics and using them to analyze data sets and models with ease.☆41Aug 18, 2022Updated 3 years ago
- ☆14Jul 23, 2020Updated 5 years ago
- Over 50 doors for Minetest☆11Mar 31, 2025Updated 10 months ago
- Since August 2023 We r improving Qaamuska iyo Erayada Afka-Soomaliga(Somali Dictionary and Vocabulary)☆18Oct 16, 2025Updated 3 months ago
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- ☆10Jul 6, 2023Updated 2 years ago
- Midi2PLAY is an application that helps the process of converting MIDI files (.mid) making them compatible with the syntax accepted by the…☆10Dec 30, 2021Updated 4 years ago
- Dockerfile for johnsmith0031/alpaca_lora_4bit☆12Apr 10, 2023Updated 2 years ago
- I will store here a bunch of home assignments that I got (without company names) and their solutions.☆12Feb 21, 2024Updated last year