uscensusbureau / SABLELinks
Scraping Assisted by Learning
☆35Updated 2 weeks ago
Alternatives and similar repositories for SABLE
Users that are interested in SABLE are comparing it to the libraries listed below
Sorting:
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆189Updated 4 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆14Updated 5 months ago
- Machine learning resources☆13Updated 7 years ago
- A selection of business datasets☆18Updated 6 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 6 years ago
- GraphiPy: Universal Social Data Extractor☆83Updated 2 years ago
- A search engine for Open Data☆56Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 10 months ago
- Techniques for Scraping the Web in Python☆26Updated 7 years ago
- The CorpWatch API uses automated parsers to extract the subsidiary relationship information from Exhibit 21 of companies' 10-K filings wi…☆48Updated 6 months ago
- Source real estate prices from the Common Crawl.☆27Updated 6 years ago
- Python 3.x notebooks about real-world data cleaning and visualization☆72Updated 9 years ago
- Examples for getting started using https://case.law☆67Updated 2 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- A Python script to retrieve plain text transcripts from YouTube videos☆27Updated 8 years ago
- Train a neural network optimized for generating Reddit subreddit posts☆28Updated 7 years ago
- How Quartz used AI to help reporters search the Mauritius Leaks☆47Updated 6 years ago
- Scrapes sites. Gets news. Eventually events.☆87Updated 9 years ago
- An automated, programming-free web scraper for interactive sites☆111Updated 2 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆29Updated last year
- A maximum-strength name parser for record linkage.☆38Updated 2 months ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 7 years ago
- Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.☆80Updated last year
- Scrapes Google Trends data over long timescales and stitches together for daily data☆72Updated 5 years ago
- Code + Jupyter Notebooks for Visualizing Clusters of Clickbait Headlines Using Spark, Word2vec, and Plotly☆47Updated 4 years ago
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Updated 8 years ago
- New repo for projects related to my blog, Probably Overthinking It.☆18Updated 3 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated 2 weeks ago