jamesmishra / mysqldump-to-csvLinks
A quickly-hacked-together Python script to turn mysqldump files to CSV files. Optimized for Wikipedia database dumps.
☆333Updated 3 years ago
Alternatives and similar repositories for mysqldump-to-csv
Users that are interested in mysqldump-to-csv are comparing it to the libraries listed below
Sorting:
- Parses log lines from an apache log☆258Updated last year
- Converts JSON files to CSV (pulling data from nested structures). Useful for Mongo data☆264Updated 4 years ago
- A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch☆401Updated 3 years ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆161Updated 5 years ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆205Updated 7 years ago
- Extract countries, regions and cities from a URL or text☆217Updated 5 years ago
- Language Detection with Infinity-gram☆230Updated 10 years ago
- Index URLs in Common Crawl☆196Updated 8 years ago
- "Stop worrying about Elasticsearch analyzers", my therapist says☆154Updated 4 years ago
- Text classification using Naive Bayes and Elasticsearch☆152Updated 9 years ago
- A URL tokenizer and token filter plugin for Elasticsearch☆63Updated 3 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆167Updated 3 years ago
- Load a CSV (or TSV) file into an Elasticsearch instance☆62Updated 3 years ago
- Send summary messages of your Luigi jobs to Slack☆46Updated 6 years ago
- The tool which imports raw JSON to ElasticSearch in one line of commands☆67Updated 6 years ago
- Analysis and visualization of email data☆144Updated 7 years ago
- Python interface to the Stanford Named Entity Recognizer☆293Updated 4 years ago
- Carrot2 plugin for ElasticSearch☆291Updated 2 years ago
- 2015 CrunchBase Data Export as CSV☆164Updated 9 years ago
- Python bindings to the Compact Language Detector☆33Updated 5 years ago
- Refinery - A locally deployable open-source web platform for analysis of large document collections☆101Updated 9 years ago
- This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wik…☆259Updated 9 years ago
- A twitter crawler in Python☆304Updated 7 years ago
- Git Support Utilities☆81Updated 2 years ago
- Command line tool for deduplicating CSV files☆431Updated 5 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆274Updated 3 years ago
- A python tool for collecting tweets in mongoDB using the search API☆80Updated 2 years ago
- ☆97Updated 4 years ago
- Randomly sample lines from a csv, tsv, or other line-based data file☆125Updated 10 years ago