pushshift / rinzler
A high performance indexing and search system for managing big data
☆17Updated 6 years ago
Alternatives and similar repositories for rinzler:
Users that are interested in rinzler are comparing it to the libraries listed below
- Read compressed NDJSON .zst files easily☆32Updated 2 years ago
- Fast and customizable tokenization☆64Updated 5 years ago
- Count-Min Tree Sketch: Approximate counting for NLP☆10Updated 7 years ago
- Facebook fastText database in SQLite with Go API☆34Updated 4 years ago
- BottomK minwise hashing for streaming set similarity☆43Updated 6 years ago
- Read and use word2vec vectors in Go☆56Updated 6 years ago
- Script to extract highly probable bots for further analysis☆12Updated 7 years ago
- Tool for computing continuous distributed representations of word. Modified to learn N-Grams☆15Updated 8 years ago
- Yet Another (natural language) Parser☆43Updated 5 years ago
- Official details for: [1803.08493] Context is Everything: Finding Meaning Statistically in Semantic Spaces☆39Updated 5 years ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 8 years ago
- Socially-Equitable Language Identification☆78Updated 2 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆169Updated 3 years ago
- A streaming ETL for fish☆13Updated 6 years ago
- High performance implementations of gradient boosting, random forests, etc. in Go☆61Updated 11 years ago
- Fast identification of character sequences in text or documents (multi-lingual)☆18Updated 9 years ago
- A pipeline for detecting novel information about entities from a stream of text, updating a knowledge base about the entities, and genera…☆32Updated 5 years ago
- Burglary prediction for mortals☆10Updated 11 months ago
- ActivityStreams 2.0 encoding/decoding for Go 1.18+☆12Updated 6 months ago
- sparse levenshtein automaton in go☆24Updated 4 years ago
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.☆98Updated 4 years ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- ☆10Updated 5 years ago
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']☆82Updated 8 years ago
- A tool for learning significant phrase/term models, and efficiently labeling with them.☆33Updated 2 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆41Updated 4 years ago
- 🕸 A simple way to extract data from Common Crawl☆34Updated 5 years ago
- Python and pandas tools to perform various analyses on different types of word lists☆16Updated 10 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆24Updated 11 years ago
- Spell correct entire sentences using nltk freqdist and symspell☆19Updated 7 years ago