ChenghaoMou / embeddings
zero-vocab or low-vocab embeddings
☆17Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for embeddings
- Combining encoder-based language models☆11Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆29Updated last month
- ☆12Updated 10 months ago
- LAReQA is a challenging benchmark for evaluating language agnostic answer retrieval from a multilingual candidate pool. This repository c…☆14Updated 4 years ago
- This repositary hosts my experiments for the project, I did with OffNote Labs.☆11Updated 3 years ago
- ☆73Updated 3 years ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 2 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated 2 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆30Updated 4 years ago
- ☆11Updated 4 months ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆31Updated 2 years ago
- Open source library for few shot NLP☆77Updated last year
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 2 years ago
- ☆19Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- This repository contains the code for paper Prompting ELECTRA Few-Shot Learning with Discriminative Pre-Trained Models.☆45Updated 2 years ago
- PyTorch reimplementation of REALM and ORQA☆22Updated 2 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 8 months ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆27Updated 2 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆75Updated 3 years ago
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆82Updated 2 years ago
- This is the official implementation of NeurIPS 2021 "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Ret…☆70Updated 2 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 6 months ago