aiintelligentsystems / next-level-bert
☆16Updated 7 months ago
Alternatives and similar repositories for next-level-bert:
Users that are interested in next-level-bert are comparing it to the libraries listed below
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆17Updated 3 months ago
- Embedding Recycling for Language models☆38Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆45Updated last year
- ☆12Updated 4 months ago
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆36Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated 2 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆28Updated 2 years ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆66Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆26Updated last year
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- LTG-Bert☆29Updated last year
- INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.☆22Updated last year
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆15Updated 11 months ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆40Updated 3 months ago
- Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers☆48Updated last year
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 2 years ago
- Interpreting Language Models with Contrastive Explanations (EMNLP 2022 Best Paper Honorable Mention)☆61Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆29Updated 3 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated 2 months ago
- ☆11Updated last year
- ☆53Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 5 months ago
- ☆13Updated last year
- Index of URLs to pdf files all over the internet and scripts☆21Updated last year
- Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages (ACL 2022)☆19Updated 2 years ago
- Repository for Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts, EMNLP22☆18Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated last year
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated 7 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Updated 3 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆51Updated last year