hartwork / surrogatesLinks
Encode and decode pairs of surrogate characters in Python 3
☆10Updated 3 years ago
Alternatives and similar repositories for surrogates
Users that are interested in surrogates are comparing it to the libraries listed below
Sorting:
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 6 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Python bindings for the fast integer compression library FastPFor.☆59Updated last year
- Simplifying parsing of large jsonline files in NLP Workflows☆12Updated 3 years ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆15Updated 7 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆17Updated 3 years ago
- Analyze trends in articles published on arXiv☆17Updated 2 years ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆19Updated 2 months ago
- Python package used to apply NLP interactive clustering methods.☆10Updated last year
- Python module (C extension and plain python) implementing DAWG☆20Updated 3 years ago
- Random program generator for Python☆10Updated 12 years ago
- Prefetch elements from a Python generator in the background, from a separate process☆34Updated 4 years ago
- Source code for my paper "Matrix Differential Calculus with Tensors (for Machine Learning)"☆12Updated 8 years ago
- 🧬 Modularised Evolutionary Algorithms For Python with Optional JIT and Multiprocessing (Ray) support. Inspired by PyTorch Lightning☆53Updated 2 years ago
- LEMON: Explainable Entity Matching☆18Updated 3 years ago
- CyDifflib is a fast implementation of difflib's algorithms, which can be used as a drop-in replacement.☆24Updated 2 months ago
- framework for making streamcorpus data☆11Updated 8 years ago
- a pure-Python PATRICIA trie implementation.☆30Updated 10 years ago
- A C++ library implementing fast language models estimation using the 1-Sort algorithm.☆17Updated 2 years ago
- Large-scale query-focused multi-document Summarization dataset☆10Updated 3 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- [Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…☆12Updated 9 months ago
- a graph definition and execution library for python☆16Updated 2 years ago
- Code for "CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection" (V. Blasch…☆9Updated 4 years ago
- A lightweight tool to measure the full memory of a Python session☆19Updated 4 months ago
- Python 3 library to store memory mappable objects into pickle-compatible files☆38Updated 7 years ago
- Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`☆18Updated 3 years ago
- ☆19Updated 5 years ago
- Support files exposing JSON from the JSON Schema specifications to Python☆12Updated this week
- Python bindings for MetroHash☆19Updated 2 months ago