pushshift / Parallel-NDJSON-ReaderLinks
Parallel NDJSON Reader for Python
☆17Updated 6 years ago
Alternatives and similar repositories for Parallel-NDJSON-Reader
Users that are interested in Parallel-NDJSON-Reader are comparing it to the libraries listed below
Sorting:
- Read compressed NDJSON .zst files easily☆35Updated 3 years ago
- A simple command line interface to the datamade/dedupe library.☆43Updated 3 years ago
- A Python Wrapper To Retrieve Data From The CrowdTangle API☆11Updated 7 months ago
- Text Thresher crowd sourced text annotator☆17Updated 8 years ago
- Interpretable data visualizations for understanding how texts differ at the word level☆286Updated 11 months ago
- ☆76Updated this week
- Turning news into events since 2014.☆51Updated 8 years ago
- Tokenizer for Twitter and Reddit data☆45Updated 6 years ago
- The documentation and scripts for the Local News Dataset☆25Updated 3 years ago
- Classify names by gender, U.S. ethnicity, or leaf nationality☆19Updated 7 years ago
- An implementation of latent Dirichlet allocation in javascript☆185Updated 3 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Fast, flexible name matching for large datasets☆71Updated 5 months ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 9 years ago
- Pushshift Telegram Ingest☆85Updated 6 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated 3 weeks ago
- Collecting thoughts about data versioning☆108Updated 6 years ago
- Ensemble topic modelling with pLSA☆114Updated 4 years ago
- Tool for probabilistically linking the records of individual entities (e.g. people) within and across datasets☆118Updated 2 months ago
- A multi-modal Twitter dataset with 7.6M tweets and 25.6M retweets related to voter fraud claims.☆53Updated 4 years ago
- A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).☆15Updated 4 years ago
- Using stochastic block models for topic modeling☆198Updated last year
- Datasets of the daily Twitter output of Congress.☆115Updated 2 years ago
- Public repository containing the dataset and code for training the models in "Ten Social Dimensions of Conversations and Relationships" (…☆14Updated 4 years ago
- Data and code for analyzing language associated with fictional characters.☆15Updated 8 years ago
- A Docker image for the CLIFF geolocation software.☆10Updated 7 years ago
- Package for performing Reddit-based text analysis☆20Updated 7 years ago
- Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.☆74Updated last year
- Tools to download and process name data from various sources.☆91Updated 12 years ago
- Group thousands of similar spreadsheet or database text entries in seconds☆158Updated 2 years ago