crux82 / squad-itLinks
A large scale dataset for Question Answering in Italian
☆27Updated 6 years ago
Alternatives and similar repositories for squad-it
Users that are interested in squad-it are comparing it to the libraries listed below
Sorting:
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆80Updated last year
- A python true casing utility that restores case information for texts☆89Updated 2 years ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆103Updated 3 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Updated 3 years ago
- Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.☆59Updated 2 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 9 months ago
- [EMNLP-Findings 2020] Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences☆63Updated last year
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- ☆74Updated 3 months ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated last year
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆69Updated 3 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- negate_sentence(A Python module that doesn't negate sentences.)☆31Updated 9 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 3 years ago
- 🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings☆58Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- ☆64Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated 2 years ago
- The French summarization dataset introduced in "BARThez: a Skilled Pretrained French Sequence-to-Sequence Model".☆23Updated 4 years ago
- Dual Encoders for State-of-the-art Natural Language Processing.☆61Updated 2 years ago
- Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data☆157Updated 2 years ago
- Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/☆192Updated last year
- xfspell — the Transformer Spell Checker☆190Updated 5 years ago
- UmBERTo: an Italian Language Model trained with Whole Word Masking.☆106Updated 2 years ago
- This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020…☆33Updated 4 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆27Updated 3 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago