An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors
☆35Mar 19, 2015Updated 11 years ago
Alternatives and similar repositories for crawl-to-the-future
Users that are interested in crawl-to-the-future are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Analysis related to article on FOIA Online Database.☆11Feb 2, 2017Updated 9 years ago
- get facebook data☆10Sep 14, 2014Updated 11 years ago
- Migrating to https://github.com/origamitower/folktale☆20Sep 6, 2016Updated 9 years ago
- ☆16Jun 7, 2018Updated 8 years ago
- Replication files for the March 2, 2015 Barron's story "The Little Guy Wins!," measuring market makers' trade execution quality.☆13Mar 12, 2015Updated 11 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Adds read support for Excel files (xls and xlsx) to agate.☆18Jun 8, 2026Updated last week
- Extract data from websites using basic statistical magic☆506Oct 2, 2020Updated 5 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41May 29, 2017Updated 9 years ago
- Obtained in December 2014 through a Freedom of Information request☆15Jan 29, 2016Updated 10 years ago
- Parse live video and extract Chyron text☆20Aug 17, 2017Updated 8 years ago
- Python's missing statistical Swiss Army knife☆15Aug 25, 2015Updated 10 years ago
- A Lit web-component for viewing a Whisper JSON transcript file☆14Feb 12, 2026Updated 4 months ago
- This project deals with hierarchical classification of web pages based on dmoz dataset.☆14Apr 10, 2014Updated 12 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- pythonic processes☆12Jun 12, 2015Updated 11 years ago
- RWA recurrent neural networks☆18Apr 14, 2017Updated 9 years ago
- A how-to do a mass collection of FEC data using the command-line and regular expressions☆29Mar 18, 2016Updated 10 years ago
- Manage and load dataprotocols.org Data Packages☆27Sep 17, 2015Updated 10 years ago
- Tweets annotated with coarse-grained sense labels (supersenses)☆13Jun 13, 2014Updated 12 years ago
- Links parts of input text to Wikipedia articles☆16Sep 9, 2012Updated 13 years ago
- Failover AWS Spot Instances☆11Dec 8, 2017Updated 8 years ago
- A clone of the windows snipping tool for imgur!☆12Apr 1, 2014Updated 12 years ago
- A Flask+Elasticsearch UI for exploring the DC Inbox dataset from http://web.stevens.edu/dcinbox/Home.html☆17Jan 21, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A semantic web crawler☆20Sep 20, 2010Updated 15 years ago
- System for mining Wikipedia Usage data to read our collective mind