henu / bigjson
Python library that reads JSON files of any size.
☆197Updated last year
Alternatives and similar repositories for bigjson:
Users that are interested in bigjson are comparing it to the libraries listed below
- A fast streaming JSON parser for Python that generates SAX-like events using yajl☆221Updated 3 months ago
- ☆167Updated 7 months ago
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆65Updated last year
- python library to simplify working with jsonlines and ndjson data☆276Updated 5 months ago
- Python binding to Poppler-cpp pdf library☆105Updated 4 months ago
- rstr is a helper module for easily generating random strings of various types. It could be useful for fuzz testing, generating dummy data…☆90Updated last year
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆64Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- Library for unit extraction - fork of quantulum for python3☆135Updated 6 months ago
- Compress responses of your Flask application.☆119Updated last week
- ☆70Updated 2 years ago
- A Python library for working with and comparing language codes.☆340Updated last month
- ☆68Updated 9 months ago
- Python 3 library for reading and writing warc files☆21Updated 6 years ago
- ndjson with the same interface as the builtin json module☆68Updated 2 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆68Updated 2 weeks ago
- Pythonic search engine based on PyLucene.☆124Updated 2 months ago
- Language detection using Spacy and Fasttext☆54Updated last year
- A fully customisable language detection pipeline for spaCy☆93Updated 5 years ago
- An open-source package for python to clean raw text data☆69Updated last year
- URL normalization for Python☆94Updated 2 years ago
- Python Elasticsearch Mock for test purposes☆112Updated 7 months ago
- Cython wrapper on Hunspell Dictionary☆66Updated 6 months ago
- Guess gender from first name in Python 2 and 3☆131Updated 2 years ago
- A fast streaming JSON parser written in Python☆61Updated 3 years ago
- Accurately find/replace/remove emojis in text strings☆160Updated last year
- A python module to split file into multiple chunks based on the given size.☆68Updated 3 months ago
- A Python implementation of Lunr.js 🌖☆194Updated 2 weeks ago
- generates previews of files with cache management☆237Updated 4 months ago