jedireza / warcLinks
A Rust library for reading and writing WARC files
☆52Updated 6 months ago
Alternatives and similar repositories for warc
Users that are interested in warc are comparing it to the libraries listed below
Sorting:
- Fast hierarchical agglomerative clustering in Rust.☆96Updated last month
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆136Updated 4 months ago
- ☆47Updated 2 years ago
- 🗄️ A simple CLI for converting WARC to Parquet.☆110Updated 3 months ago
- ☆66Updated 2 years ago
- Xor filters - efficient probabilistic hashsets. Faster and smaller than bloom and cuckoo filters.☆139Updated last year
- Fast English word segmentation in Rust☆99Updated 2 weeks ago
- Rust client for txtai☆111Updated last month
- Fast item-to-item recommendations on the command line.☆37Updated 2 years ago
- A collection of small notes that aren't appropriate for my blog.☆32Updated 2 years ago
- rust external sort algorithm implementation☆17Updated 3 weeks ago
- A vectorized JSON parser for pre-validated, minified documents☆83Updated 10 months ago
- finalfusion embeddings in Rust☆101Updated last year
- Rust implementation of JMESPath, a query language for JSON☆138Updated 8 months ago
- A rust crate providing fuzzy search/string matching using N-grams☆30Updated 10 months ago
- Generic implementations of clustering algorithms.☆21Updated 7 years ago
- Multilingual implementation of RAKE algorithm for Rust☆34Updated 3 months ago
- Rust implementation of Simhash☆22Updated 2 years ago
- Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly☆220Updated last month
- Rust library for asynchronous stream (de)serialization☆24Updated 3 months ago
- Lightweight FST-based autocompleter library written in Rust, targeting WebAssembly and data stored in-memory☆32Updated 2 years ago
- ☆21Updated 7 years ago
- Native Rust port of Google's HighwayHash, which makes use of SIMD instructions for a fast and strong hash function☆165Updated 2 months ago
- Rust wrapper for the BlingFire tokenization library☆15Updated 4 years ago
- Rust library for generating non-sequential, tightly-packed short IDs.☆39Updated last year
- Fast approximate nearest neighbor searching in Rust, based on HNSW index☆326Updated 2 weeks ago
- Texting Robots: A Rust native `robots.txt` parser with thorough unit testing☆28Updated last year
- A Rust port of the WebGraph framework☆53Updated last week
- Common stop words in a variety of languages☆21Updated 3 months ago
- A fast, offline, reverse geocoder☆131Updated last month