MinishLab / tokenlearn
Pre-train Static Word Embeddings
β49Updated last week
Alternatives and similar repositories for tokenlearn:
Users that are interested in tokenlearn are comparing it to the libraries listed below
- NLP with Rust for Python π¦πβ61Updated 9 months ago
- Generalist and Lightweight Model for Text Classificationβ91Updated this week
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β28Updated 3 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- Efficient few-shot learning with cross-encoders.β49Updated last year
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated last year
- β47Updated last year
- A RAG that can scale π§π»βπ»β11Updated 9 months ago
- My NER Experiments with ModernBERTβ17Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ59Updated last year
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β64Updated last week
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ18Updated last month
- Improving Text Embedding of Language Models Using Contrastive Fine-tuningβ61Updated 7 months ago
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ100Updated last year
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).β80Updated last year
- Using short models to classify long textsβ21Updated 2 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)β97Updated last year
- Library for fast text representation and classification.β28Updated last year
- β42Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β46Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β24Updated 3 months ago
- β48Updated 4 months ago
- Tools to make language models a bit easier to useβ39Updated 2 weeks ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β67Updated 4 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β26Updated 11 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ31Updated last year