MeetElise / surprise-similarity
A context-aware embedding similarity score
☆11Updated last year
Alternatives and similar repositories for surprise-similarity:
Users that are interested in surprise-similarity are comparing it to the libraries listed below
- Pre-train Static Word Embeddings☆42Updated this week
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated 2 months ago
- Efficient few-shot learning with cross-encoders.☆44Updated 11 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated 5 months ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆17Updated 3 months ago
- Generalist and Lightweight Model for Text Classification☆59Updated last week
- Wikipedia text corpus for self-supervised NLP model training☆41Updated 2 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆31Updated 8 months ago
- ☆21Updated this week
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆27Updated last month
- My NER Experiments with ModernBERT☆15Updated 3 weeks ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 6 months ago
- 💫 SpaCy wrapper for ConceptNet 💫☆89Updated last year
- Pytorch implementation of a BiLSTM model for the Wikification project.☆18Updated 4 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆13Updated 7 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆45Updated last year
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 8 months ago
- Using short models to classify long texts☆21Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆64Updated this week
- Robust and fast topic models with sentence-transformers.☆42Updated 3 weeks ago
- ☆45Updated 2 years ago
- Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)☆22Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆97Updated 10 months ago
- Tools to make language models a bit easier to use☆33Updated last week
- A News Article Collection Library☆22Updated last year
- A RAG that can scale 🧑🏻💻☆11Updated 8 months ago
- Completion After Prompt Probability. Make your LLM make a choice☆73Updated 2 months ago