hartwork / surrogatesLinks
Encode and decode pairs of surrogate characters in Python 3
☆10Updated 3 years ago
Alternatives and similar repositories for surrogates
Users that are interested in surrogates are comparing it to the libraries listed below
Sorting:
- A Python interface to PISA☆37Updated 4 months ago
- Toolkit for domain-specific information retrieval experimentation☆19Updated this week
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆76Updated 2 weeks ago
- Train a model, and detect gibberish strings with it.☆68Updated 3 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- A Fast Levenshtein Distance Library for Python☆86Updated this week
- A utility to split tarballs into smaller pieces while keeping files intact.☆18Updated 3 years ago
- A robust web archive analytics toolkit☆129Updated 3 months ago
- A Python license checker☆16Updated 10 months ago
- INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.☆24Updated 2 years ago
- The Keep It Simple Software Bill of Material☆11Updated 4 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆45Updated last year
- A python package to simulate typographical errors.☆38Updated 2 years ago
- German Text Embedding Clustering Benchmark☆18Updated last year
- Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.☆35Updated 3 weeks ago
- A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.☆23Updated 3 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Updated last year
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆63Updated last month
- Library for fast text representation and classification.☆31Updated 2 years ago
- ☆87Updated 3 years ago
- 🛠️ Tools for Transformers compression using PyTorch Lightning ⚡☆85Updated last week
- Podium: a framework agnostic Python NLP library for data loading and preprocessing☆60Updated 3 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 3 years ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- A list of multi-vector retrieval resources☆18Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆18Updated last year
- A Streamlit component for annotating text by text selecting.☆42Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Updated 7 months ago
- Check for multiple patterns in a single string at the same time: a fast Aho-Corasick algorithm for Python☆218Updated this week
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Updated 2 years ago