worldbank / GISTEmbed
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings
☆36Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for GISTEmbed
- Dense hybrid representations for text retrieval☆61Updated last year
- ☆55Updated last year
- Resources & scripts for the paper "MTEB: Massive Text Embedding Benchmark"☆15Updated last month
- Inquisitive Parrots for Search☆177Updated 8 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- ☆29Updated 9 months ago
- ☆15Updated 3 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated 11 months ago
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆93Updated last year
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆37Updated 2 weeks ago
- ☆45Updated 2 years ago
- ☆83Updated 2 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆52Updated 3 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 2 years ago
- Benchmarking library for RAG☆112Updated this week
- Finetune mistral-7b-instruct for sentence embeddings☆70Updated 6 months ago
- Retrieval-Augmented Generation battle!☆44Updated last month
- Using business-level retrieval system (BM25) with Python in just a few lines.☆31Updated last year
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆30Updated last year
- ☆14Updated 8 months ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Updated 2 years ago
- Repo for "On Learning to Summarize with Large Language Models as References"☆42Updated last year
- A multilingual version of MS MARCO passage ranking dataset☆141Updated last year
- ☆37Updated last month
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆20Updated 10 months ago
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆54Updated 6 months ago
- An easy-to-use python toolkit for flexibly adapting various neural ranking models to any target domain.☆59Updated last year
- CLIR version of ColBERT☆64Updated last month
- Zero-shot Document Ranking with Large Language Models.☆95Updated 4 months ago