Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing
☆76May 1, 2026Updated 2 weeks ago
Alternatives and similar repositories for retrie
Users that are interested in retrie are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆20Jul 5, 2024Updated last year
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Google Tink's critical Ed25519 bug related to Java "final" keyword☆11Apr 5, 2020Updated 6 years ago
- A public repository for corrupt0 datathon's court data☆11Jul 6, 2019Updated 6 years ago
- A reddit bot that finds original publish dates on linked articles.☆10Nov 30, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☯️ AllenNLP training configurations for promising models on Named Entity Recognition. (BiLSTM-CRF, BiLSTM-CNN-CRF, BERT, BERT-CRF)☆15Nov 26, 2020Updated 5 years ago
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- Thai PDPA Website (Unofficial)☆11Jun 10, 2023Updated 2 years ago
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆16Sep 20, 2023Updated 2 years ago
- MDLText☆12Jul 13, 2017Updated 8 years ago
- Efficient string matching with regular expressions☆146May 12, 2026Updated last week
- Standalone Dictionary-based, Maximum Matching + Thai Character Cluster (newmm) tokenizer extracted from PyThaiNLP☆13Jan 6, 2022Updated 4 years ago
- Code and dataset for the EMNLP 2024 paper: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory☆49Sep 26, 2024Updated last year
- Alternative robots parser module for Python☆22Apr 8, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A micro service that allows to compile *Tex-files via HTTP☆13Mar 11, 2018Updated 8 years ago
- Starter repo for regl explorations☆10May 26, 2017Updated 8 years ago
- e-magyar text processing system -- inter-module communication via tsv + REST API☆31Aug 23, 2025Updated 8 months ago
- ChatGPT with access to the internet☆25Jun 16, 2023Updated 2 years ago
- Speed testing for a data munging task☆47Feb 23, 2013Updated 13 years ago
- Magyar morfológiai generátor☆16Dec 12, 2025Updated 5 months ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- epsilon is a scanner generator☆29Jun 12, 2022Updated 3 years ago
- ☆21Aug 19, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110May 16, 2024Updated 2 years ago
- This packages up data for the Open Multilingual Wordnet☆69Mar 28, 2026Updated last month
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆199Jun 6, 2025Updated 11 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59May 3, 2024Updated 2 years ago
- ☆11Mar 16, 2018Updated 8 years ago
- "Learning What is Essential in Questions", CoNLL, 2017☆26Aug 3, 2018Updated 7 years ago
- Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+