Guilherme-Routar / Twikenizer
This repository hosts the code for a tokenizer of tweets.
☆12Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for Twikenizer
- Statistics on multilingual datasets☆17Updated 2 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 2 years ago
- Efficient Sentence Embedding via Semantic Subspace Analysis☆14Updated 4 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Updated 2 years ago
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16Updated 2 years ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆25Updated last year
- ☆23Updated 4 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆31Updated 2 years ago
- ☆29Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆30Updated 4 years ago
- Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization☆28Updated 2 years ago
- Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OF…☆26Updated 3 years ago
- ☆11Updated 4 months ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- Modular implementation of an AM dependency parser in AllenNLP.☆30Updated 5 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- Codebase for probing and visualizing multilingual models.☆45Updated 4 years ago
- ☆10Updated 6 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- Perturbation CheckLists for Evaluating NLG Evaluation Metrics, EMNLP 2021☆9Updated 2 years ago
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆13Updated 5 months ago
- Analyzing mBERT's multilinguality in a small laboratory setting☆13Updated last year
- Dynamic ensemble decoding with transformer-based models☆29Updated last year
- Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration☆19Updated 3 years ago
- Dependency Parsing as Sequence Labeling☆26Updated 3 months ago
- ☆22Updated 2 years ago
- DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence☆35Updated last year
- ☆13Updated 3 years ago
- This is a repository for the paper on testing inductive bias with scaled-down RoBERTa models.☆19Updated 2 years ago