MinishLab / tokenlearn
Pre-train Static Word Embeddings
β52Updated last month
Alternatives and similar repositories for tokenlearn:
Users that are interested in tokenlearn are comparing it to the libraries listed below
- β40Updated 2 months ago
- Efficient few-shot learning with cross-encoders.β50Updated last year
- NLP with Rust for Python π¦πβ61Updated 10 months ago
- β44Updated last month
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β24Updated last month
- Generalist and Lightweight Model for Text Classificationβ115Updated this week
- minimal pytorch implementation of bm25 (with sparse tensors)β100Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β29Updated this week
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ59Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β67Updated this week
- Improving Text Embedding of Language Models Using Contrastive Fine-tuningβ62Updated 8 months ago
- β44Updated 2 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).β80Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β48Updated last year
- β17Updated this week
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β76Updated 5 months ago
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated last year
- β47Updated last year
- β23Updated this week
- β48Updated 5 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ31Updated last year
- A RAG that can scale π§π»βπ»β11Updated 10 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ19Updated 2 months ago
- My NER Experiments with ModernBERTβ18Updated 3 months ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Trainingβ63Updated 2 months ago
- PyTorch implementation for MRLβ18Updated last year
- "Syntriever: How to Train Your Retriever with Synthetic Data from LLMs" the Nations of the Americas Chapter of the Association for Computβ¦β24Updated last month
- XTR/WARP is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.β122Updated 5 months ago
- Code for SaGe subword tokenizer (EACL 2023)β24Updated 4 months ago