SkobelevIgor / stackexchange-xml-converterLinks
Stackexchange (e.g., stackoverflow) data dump converter from XML to CSV format.
☆80Updated 3 years ago
Alternatives and similar repositories for stackexchange-xml-converter
Users that are interested in stackexchange-xml-converter are comparing it to the libraries listed below
Sorting:
- LinkRun - Data Engineering project done in 3 weeks during the Insight fellowship☆39Updated 5 years ago
- ☆18Updated last year
- Convert Wikipedia database dumps into plaintext files☆326Updated 4 years ago
- The LAW next generation crawler.☆88Updated 3 years ago
- Python scripts to import StackExchange data dump into Postgres DB.☆88Updated 3 years ago
- API - extract a list of keywords from a text.☆18Updated 8 years ago
- Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.☆15Updated 2 years ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆38Updated last year
- Common Crawl fork of Apache Nutch☆38Updated last month
- Parse government documents into well formed JSON☆73Updated 2 months ago
- English stopwords collection☆163Updated 9 years ago
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.☆125Updated last year
- Example of building a working Spanish-to-English translation model with Marian NMT☆23Updated 5 years ago
- The open-source email parsing microservice for HTTP APIs. Receive email at your project's email address and automatically initiate a JSON…☆47Updated 8 years ago
- Download subreddit comments☆97Updated 3 years ago
- Tools to construct and process Common Crawl webgraphs☆99Updated last week
- this shows how to use github actions to do periodic data scraping☆235Updated this week
- Tiny distributed file system like HDFS (and of-course GFS)☆85Updated 6 years ago
- Unreliable News Index (for Columbia Journalism Review)☆56Updated 3 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆141Updated 2 months ago
- Python library for downloading closed captions(subtitles) from Youtube☆62Updated 2 years ago
- Client side demo to search in a Elasticsearch server using HTML and JavaScript☆16Updated 8 years ago
- PDF parser and converter to HTML☆89Updated last year
- Python Implementation of Google PageSpeed Insights☆40Updated last year
- GeoIP2 - free IP geolocation database.☆74Updated last week
- This repository contains code to build an MVP search engine with google like interface.☆15Updated 2 months ago
- Search COVID-19 Open Research Dataset (CORD-19) using Vespa - the open source big data serving engine.☆38Updated 3 weeks ago
- Scrape article metadata from major media outlet's websites, including NYT, WaPo, WSJ. Built on top of the Newspaper Python Library (http…☆54Updated 8 years ago