Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing
☆76Apr 1, 2026Updated last week
Alternatives and similar repositories for retrie
Users that are interested in retrie are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆20Jul 5, 2024Updated last year
- Basis of FragDenStaat.de's „Koalitionstracker“☆15Jul 14, 2025Updated 8 months ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆30Nov 18, 2025Updated 4 months ago
- ☯️ AllenNLP training configurations for promising models on Named Entity Recognition. (BiLSTM-CRF, BiLSTM-CNN-CRF, BERT, BERT-CRF)☆15Nov 26, 2020Updated 5 years ago
- Notes on papers in Natural Language Processing, Computational Linguistics, and the related sciences☆14Mar 30, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- A companion repository to the "You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source"…☆20Oct 14, 2022Updated 3 years ago
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆16Sep 20, 2023Updated 2 years ago
- Standalone Dictionary-based, Maximum Matching + Thai Character Cluster (newmm) tokenizer extracted from PyThaiNLP☆13Jan 6, 2022Updated 4 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Dec 2, 2024Updated last year
- Read fixed width data files with Python 3☆14Mar 20, 2026Updated 3 weeks ago
- VS Code extension that adds syntax highlighting for ssh config files☆11Nov 17, 2025Updated 4 months ago
- Some useful scripts to run ipptool commands against printers☆12Feb 8, 2017Updated 9 years ago
- Plot charts from arbtt-stats to terminal☆17Jun 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- 📔️ Generate a text-based journal from a template file.☆21Mar 16, 2021Updated 5 years ago
- Slides for an opinionated talk about what it means to be a senior software engineer☆15Jun 17, 2023Updated 2 years ago
- Links to export personal data from popular internet services☆22Feb 4, 2024Updated 2 years ago
- ☆13Oct 20, 2022Updated 3 years ago
- Alternative robots parser module for Python☆22Mar 1, 2026Updated last month
- KL3M training data collection and preprocessing☆21Apr 14, 2025Updated 11 months ago
- A micro service that allows to compile *Tex-files via HTTP☆13Mar 11, 2018Updated 8 years ago
- Python bindings for the PCRE2 library created by Philip Hazel☆17Mar 17, 2026Updated 3 weeks ago
- Legal Code for the State of Utah☆44Apr 8, 2014Updated 12 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Guidelines for Propbank☆27Apr 3, 2023Updated 3 years ago
- Project to enable search of key words in text files extracted by the Querido Diário.☆14Jul 15, 2020Updated 5 years ago
- ☆12Dec 8, 2020Updated 5 years ago
- e-magyar text processing system -- inter-module communication via tsv + REST API☆32Aug 23, 2025Updated 7 months ago
- The code used to create and update the Open Australian Legal Embeddings, the first open-source embeddings of Australian legislative and j…☆13Feb 17, 2024Updated 2 years ago
- ChatGPT with access to the internet☆26Jun 16, 2023Updated 2 years ago
- Allows manual adding and editon of Timetracking Entries☆21May 18, 2021Updated 4 years ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Feb 27, 2023Updated 3 years ago
- MCP Server für Deutsche Gesetzestexte☆43Dec 19, 2025Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- AUSTENDER OCDS Search API. This portal will provide users of AusTender data with documentation, code examples, bug notifications and feat…☆19Feb 12, 2024Updated 2 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆31Apr 15, 2021Updated 4 years ago
- Magyar morfológiai generátor☆16Dec 12, 2025Updated 3 months ago
- This repository provides a clear, educational implementation of Byte Pair Encoding (BPE) tokenization in plain Python. The focus is on al…☆15Aug 28, 2024Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110May 16, 2024Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆190Jun 6, 2025Updated 10 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59May 3, 2024Updated last year