worldbank/GISTEmbed

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/worldbank/GISTEmbed)

worldbank / GISTEmbed

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings

☆45

Alternatives and similar repositories for GISTEmbed

Users that are interested in GISTEmbed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LeeSureman / E5-Retrieval-Reproduction
View on GitHub
Use contrastive learning to train a large language model (LLM) as a retriever
☆12Jul 19, 2024Updated 2 years ago
worldbank / wb-nlp-tools
View on GitHub
Natural language processing tools developed by the World Bank's DECAT unit. A suite of text preprocessing and cleaning algorithms for NLP…
☆10Jun 11, 2022Updated 4 years ago
worldbank / wb-nlp-apps
View on GitHub
This repository contains the NLP modeling components and web application implementations of a project for knowledge and data discovery fu…
☆13Jun 29, 2021Updated 5 years ago
Wenjun-Peng / GPT4SM
View on GitHub
☆11Jun 7, 2023Updated 3 years ago
ielab / Starbucks
View on GitHub
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆25Jun 30, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hotchpotch / yasem
View on GitHub
YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings
☆13May 22, 2025Updated last year
microsoft / multifield-adaptive-retrieval
View on GitHub
Code for the paper "Multi-Field Adaptive Retrieval," a research project on a semi-structured document retrieval
☆18Feb 13, 2026Updated 5 months ago
asahi417 / lm-vocab-trimmer
View on GitHub
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…
☆67Oct 25, 2024Updated last year
ma787639046 / bowdpr
View on GitHub
[SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
☆18Feb 29, 2024Updated 2 years ago
alvarobartt / opentrain
View on GitHub
🚂 Fine-tune OpenAI models for text classification, question answering, and more
☆17May 1, 2023Updated 3 years ago
jakespringer / echo-embeddings
View on GitHub
☆168Apr 17, 2024Updated 2 years ago
Yibin-Lei / MetaEOL
View on GitHub
Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"
☆12Jul 25, 2024Updated 2 years ago
stefan-it / modern-bert-ner
View on GitHub
My NER Experiments with ModernBERT and Ettin
☆29Jul 17, 2025Updated last year
instructkr / reranker-simple-benchmark
View on GitHub
Make running benchmark simple yet maintainable, again. Now only supports Korean-based cross-encoder.
☆35Dec 2, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MikeWangWZHL / Zemi
View on GitHub
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆15May 3, 2023Updated 3 years ago
facebookresearch / ELECTRA-Fewshot-Learning
View on GitHub
This repository contains the code for paper Prompting ELECTRA Few-Shot Learning with Discriminative Pre-Trained Models.
☆48Jun 7, 2022Updated 4 years ago
thakur-nandan / income
View on GitHub
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆24Sep 24, 2023Updated 2 years ago
worldbank / metadata-editor-docs
View on GitHub
Metadata Editor user and practice guide
☆19Jul 9, 2026Updated 2 weeks ago
ihsn / nada
View on GitHub
National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare,…
☆51Updated this week
pygongnlp / CoSearchAgent
View on GitHub
[SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models
☆30Feb 15, 2024Updated 2 years ago
ritaranx / NeST
View on GitHub
[AAAI 2023] This is the code for our paper `Neighborhood-Regularized Self-Training for Learning with Few Labels'.
☆12Jan 11, 2023Updated 3 years ago
hkust-nlp / SynCSE
View on GitHub
This is the official implementation of the paper: "Contrastive Learning of Sentence Embeddings from Scratch"
☆40Jun 9, 2023Updated 3 years ago
LoicGrobol / zeldarose
View on GitHub
Train transformer-based models.
☆28Apr 12, 2026Updated 3 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
jensjorisdecorte / Skill-Extraction-benchmark
View on GitHub
Dataset used to evaluate Skill Extraction systems based on the ESCO skills taxonomy.
☆17Jul 18, 2024Updated 2 years ago
krylm / whisper-event-tuning
View on GitHub
Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.
☆12Dec 24, 2022Updated 3 years ago
OpenBMB / DEBATER
View on GitHub
This is the code repo for our paper "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Searc…
☆26Mar 2, 2025Updated last year
hotchpotch / yast
View on GitHub
YAST - Yet Another SPLADE or Sparse Trainer
☆21Jun 16, 2025Updated last year
J-Seo / KoCommonGEN-V2
View on GitHub
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
☆25Aug 24, 2024Updated last year
oceanumeric / EnteRAG
View on GitHub
A RAG that can scale 🧑🏻‍💻
☆11May 28, 2024Updated 2 years ago
xinghaow99 / DenoSent
View on GitHub
[AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning
☆15Apr 29, 2024Updated 2 years ago
daekeun-ml / evaluate-llm-on-korean-dataset
View on GitHub
Performs benchmarking on two Korean datasets with minimal time and effort.
☆45Jan 22, 2026Updated 6 months ago
orionw / promptriever
View on GitHub
The first dense retrieval model that can be prompted like an LM
☆93May 8, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
v-liuwei / USTC-2021Spring-Introduction_to_Deep_Learning
View on GitHub
USTC 2021春季学期深度学习导论实验：FNN，CNN，RNN，LSTM，BERT，GCN
☆30Jun 21, 2021Updated 5 years ago
JeanKaddour / NoTrainNoGain
View on GitHub
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Aug 30, 2023Updated 2 years ago
Marker-Inc-Korea / AutoRAG-example-korean-embedding-benchmark
View on GitHub
AutoRAG example about benchmarking Korean embeddings.
☆46Oct 2, 2024Updated last year
ottowg / gsap-ner
View on GitHub
☆10Oct 2, 2024Updated last year
MatthewCYM / GenSE
View on GitHub
Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Fram…
☆23Nov 27, 2022Updated 3 years ago
gangiswag / llm-reranker
View on GitHub
☆63Jan 26, 2025Updated last year