Tixierae / OrangeSum
The French summarization dataset introduced in "BARThez: a Skilled Pretrained French Sequence-to-Sequence Model".
☆22Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for OrangeSum
- A french sequence to sequence pretrained model☆57Updated 2 years ago
- Dynamic ensemble decoding with transformer-based models☆29Updated last year
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- 🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings☆58Updated last year
- Efficient-Sentence-Embedding-using-Discrete-Cosine-Transform☆17Updated 4 years ago
- ☆12Updated 2 years ago
- Explainable Zero-Shot Topic Extraction☆61Updated 3 months ago
- A repository for our AAAI-2020 Cross-lingual-NER paper. Code will be updated shortly.☆46Updated last year
- Implementation, trained models and result data for the paper "Aspect-based Document Similarity for Research Papers" #COLING2020☆62Updated 6 months ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆101Updated 2 years ago
- Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statement…☆16Updated 3 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆76Updated 4 months ago
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆82Updated last month
- ☆73Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 5 months ago
- Multi-task modelling extensions for huggingface transformers☆18Updated last year
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.☆36Updated 2 years ago
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆37Updated last year
- ☆35Updated 2 years ago
- Temporary remove unused tokens during training to save ram and speed.☆22Updated 4 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆36Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆27Updated 3 years ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago