google-research / url-nlp
☆204Updated 2 weeks ago
Alternatives and similar repositories for url-nlp:
Users that are interested in url-nlp are comparing it to the libraries listed below
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆99Updated 10 months ago
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆99Updated last week
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆71Updated last year
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆71Updated 7 months ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆267Updated last month
- NTREX -- News Test References for MT Evaluation☆81Updated 9 months ago
- ☆83Updated 5 months ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆55Updated 2 years ago
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆157Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆151Updated this week
- PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an…☆271Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmark☆118Updated 6 months ago
- ☆31Updated 8 months ago
- A neural word aligner based on multilingual BERT☆339Updated 3 years ago
- ☆97Updated 2 years ago
- The FLORES+ Machine Translation Benchmark☆101Updated 4 months ago
- A Multilingual Replicable Instruction-Following Model☆93Updated last year
- The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization☆157Updated 2 years ago
- a tool for calcualting character n-gram F score☆70Updated 2 years ago
- The Benchmark of Linguistic Minimal Pairs☆149Updated 2 years ago
- The pipeline for the OSCAR corpus☆166Updated last year
- ☆72Updated last year
- Repository to collect and categorize Grammatical Error Correction papers.☆116Updated 4 months ago
- GEMBA — GPT Estimation Metric Based Assessment☆113Updated 7 months ago
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning☆101Updated 4 years ago
- Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".☆58Updated 4 years ago
- A repository with the code related to experiments around context-aware machine translation☆48Updated 2 years ago
- ☆182Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆211Updated 3 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆78Updated 6 months ago