hartwork / surrogates
Encode and decode pairs of surrogate characters in Python 3
☆11Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for surrogates
- Fast and thread safe C++11 implementation of of the Aho-Corasick algorithm.☆9Updated 4 years ago
- Correction of spaces with character-based neural language models.☆13Updated 2 years ago
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- Python module (C extension and plain python) implementing DAWG☆20Updated 2 years ago
- Code for "CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection" (V. Blasch…☆9Updated 4 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Large-scale query-focused multi-document Summarization dataset☆10Updated 3 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆16Updated last year
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 9 months ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆17Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆24Updated last year
- Custom Python functions for working with SQLite FTS4☆22Updated 2 years ago
- Python binding of cedar (implementation of efficiently-updatable double-array trie) using Cython☆17Updated 4 years ago
- An efficient data structure for fast string similarity searches☆23Updated 3 years ago
- bin files☆13Updated 2 months ago
- A simple library for training named entity recognition model from partially annotated data☆21Updated last year
- Hugging Face and Pyserini interoperability☆19Updated last year
- Analyze trends in articles published on arXiv☆15Updated last year
- framework for making streamcorpus data☆11Updated 7 years ago
- An author identification system based on recur☆21Updated 7 years ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆17Updated 2 years ago
- Vendy is a tool for vendoring third-party packages into your project.☆15Updated 11 months ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated 6 months ago
- Python bindings for the fast integer compression library FastPFor.☆57Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- Nvidia GPU Fan Controller for linux☆16Updated 5 months ago
- A python package to simulate typographical errors.☆31Updated 11 months ago
- CyDifflib is a fast implementation of difflib's algorithms, which can be used as a drop-in replacement.☆18Updated 9 months ago