Guilherme-Routar / TwikenizerLinks

This repository hosts the code for a tokenizer of tweets.

☆12

Alternatives and similar repositories for Twikenizer

Users that are interested in Twikenizer are comparing it to the libraries listed below

Sorting:

mayhewsw / multilingual-data-stats
Statistics on multilingual datasets
☆17Updated 2 years ago
martiansideofthemoon / relic-retrieval
Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).
☆20Updated 3 years ago
nlp-stat-test / nlp-stat-test
The NLPStatTest project
☆12Updated 3 years ago
MilaNLProc / language-invariant-properties
☆22Updated 3 years ago
HKUST-KnowComp / MLMA_hate_speech
Dataset and code of our EMNLP 2019 paper "Multilingual and Multi-Aspect Hate Speech Analysis"
☆56Updated 7 months ago
pdufter / staticlama
☆13Updated 4 years ago
adapter-hub / hgiyt
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
☆27Updated 3 years ago
ahmetustun / udapter
UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…
☆31Updated 2 years ago
AkshitaJha / NLP_CSS_2017
☆10Updated 6 years ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆30Updated 3 years ago
ivanmontero / autobot
Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'
☆17Updated 3 years ago
amazon-science / contrastive-controlled-mt
Code and data for the IWSLT 2022 shared task on Formality Control for SLT
☆21Updated 2 years ago
TurkuNLP / wikibert
BERT models for many languages created from Wikipedia texts
☆33Updated 5 years ago
flipz357 / S3BERT
Semantically Structured Sentence Embeddings
☆66Updated 8 months ago
SemEval / SemEval2021
☆29Updated 3 years ago
BinWang28 / Sentence-Embedding-S3E
Efficient Sentence Embedding via Semantic Subspace Analysis
☆14Updated 5 years ago
alexwarstadt / data_generation
☆29Updated last year
langtech-bsc / mt-evaluation
A framework for evaluating Machine Translation models.
☆9Updated last month
google-research-datasets / wikifact
Wikipedia based dataset to train relationship classifiers and fact extraction models
☆25Updated 4 years ago
timoschick / form-context-model
This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.
☆31Updated 5 years ago
anthonywchen / MOCHA
Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".
☆16Updated 3 years ago
adalmia96 / Cluster-Analysis
☆54Updated 3 years ago
LuisaMaerz / KnowMAN
KnowMAN: Weakly Supervised Multinomial Adversarial Networks
☆12Updated 3 years ago
yanaiela / TNE
codebase for the Text-based NP Enrichment (TNE) paper
☆20Updated last year
uclnlp / APE
Adaptive Passage Encoder for Open-domain Question Answering
☆15Updated 4 years ago
rudinger / defeasible-nli
Defeasible Natural Language Inference
☆12Updated 4 years ago
bloomberg / entsum
Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization
☆27Updated 3 years ago
tomhosking / torchseq
Pytorch Seq2Seq framework
☆27Updated 8 months ago
nyu-mll / pretraining-learning-curves
The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"
☆21Updated 4 years ago
google-research-datasets / MultiReQA
We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…
☆31Updated 4 years ago